Beefy Boxes and Bandwidth Generously Provided by pair Networks Bob
The stupid question is the question not asked
 
PerlMonks  

Re: Regex to fix up records, some multiline fields, some not

by Eily (Hermit)
on Aug 20, 2013 at 12:37 UTC ( #1050175=note: print w/ replies, xml ) Need Help??


in reply to Regex to fix up records, some multiline fields, some not

Instead of having one regex for each field, you can use the /g modifier to go from one field to the other, and use the (?=EXPR) syntax to check that what follows your field is another one and not data without

use Data::Dumper; my $regex = qr/ ^field(\d+): # find a line starting by 'field' and + capture its number (.*?)\n? # find the smallest string before the + next (?=^field\d+:|\z) # line starting by 'field' or end of +record. Rewind just before that point after the match. /msx; # ^ matches beginning of line, . matches \n and spac +es and comments are ignored in the regex my %result; my $count = 1; { # block to limit the scope of local local $/ = ""; # records are separated by empty lines while(<DATA>) { my %hash; while(/$regex/g) { $hash{"field$1"} = $2; } $result{"record ".$count++} = \%hash; } } print Dumper \%result; __DATA__ field1: data 1 monday field2: data 2 monday field3: data 3 monday field1: data 1 tuesday field2: data 2 tuesday tuesday details line 1 tuesday details line 2 field3: data 3 tuesday
$VAR1 = { 'record 1' => { 'field1' => ' data 1 monday', 'field2' => ' data 2 monday', 'field3' => ' data 3 monday ' }, 'record 2' => { 'field1' => ' data 1 tuesday', 'field2' => ' data 2 tuesday tuesday details line 1 tuesday details line 2', 'field3' => ' data 3 tuesday' } };


Comment on Re: Regex to fix up records, some multiline fields, some not
Select or Download Code

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://1050175]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others wandering the Monastery: (7)
As of 2014-04-19 20:16 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    April first is:







    Results (483 votes), past polls