Here's one possible regex approach using named captures (perlre) and %+. However, it's starting to get a little complex, so the line-based state machine type approach from Corion's post is starting to look a little better in this case (I gave some more examples of the state machine approach in this thread).
use warnings;
use strict;
use Data::Dump;
local $/ = "\n\n\n";
while (<DATA>) {
# clobber the header
s/ ^ \s* Dumpdata\s+example \s* \n \s* -+ \s* \n //msx
or next;
my %row;
$row{$+{key}} = $+{val}
while m{
(?: # the very first key doesn't need colon
\A \s* (?<key> \w+ )
# but the other keys need colons
| ^ \s* (?<key> \w+ : )
) \s+
# values should end at the next key
(?<val> (?: (?!^\s*\w+:) . )+ )
}xmsg;
s/\s+/ /g for values %row;
dd \%row;
}
__DATA__
Dumpdata example
-----------------
Warning bad news here
Detail: Some really nice infos these are
Info: This is a problem
but there is a solution
Dumpdata example
-----------------
Warning test
Detail: foo
bar
Info: quz
baz
Spec: blah
Output:
{
"Detail:" => "Some really nice infos these are ",
"Info:" => "This is a problem but there is a solution ",
"Warning" => "bad news here ",
}
{
"Detail:" => "foo bar ",
"Info:" => "quz baz ",
"Spec:" => "blah ",
"Warning" => "test ",
}
Update: Using the "branch reset" pattern (?|...) allows for a little bit of simplification:
my %row = m{
(?| \A \s* ( \w+ )
| ^ \s* ( \w+ : ) ) \s+
( (?: (?!^\s*\w+:) . )+ )
}xmsg;