Beefy Boxes and Bandwidth Generously Provided by pair Networks
The stupid question is the question not asked

Re: Regex to fix up records, some multiline fields, some not

by Eily (Parson)
on Aug 20, 2013 at 12:37 UTC ( #1050175=note: print w/replies, xml ) Need Help??

in reply to Regex to fix up records, some multiline fields, some not

Instead of having one regex for each field, you can use the /g modifier to go from one field to the other, and use the (?=EXPR) syntax to check that what follows your field is another one and not data without

use Data::Dumper; my $regex = qr/ ^field(\d+): # find a line starting by 'field' and + capture its number (.*?)\n? # find the smallest string before the + next (?=^field\d+:|\z) # line starting by 'field' or end of +record. Rewind just before that point after the match. /msx; # ^ matches beginning of line, . matches \n and spac +es and comments are ignored in the regex my %result; my $count = 1; { # block to limit the scope of local local $/ = ""; # records are separated by empty lines while(<DATA>) { my %hash; while(/$regex/g) { $hash{"field$1"} = $2; } $result{"record ".$count++} = \%hash; } } print Dumper \%result; __DATA__ field1: data 1 monday field2: data 2 monday field3: data 3 monday field1: data 1 tuesday field2: data 2 tuesday tuesday details line 1 tuesday details line 2 field3: data 3 tuesday
$VAR1 = { 'record 1' => { 'field1' => ' data 1 monday', 'field2' => ' data 2 monday', 'field3' => ' data 3 monday ' }, 'record 2' => { 'field1' => ' data 1 tuesday', 'field2' => ' data 2 tuesday tuesday details line 1 tuesday details line 2', 'field3' => ' data 3 tuesday' } };

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://1050175]
[ambrus]: and the electronics gets reselled almost new, but it has to be sold at half price because otherwise everyone chooses to buy the new product which has fewer risk of selling damaged products labelled as almost new.
[ambrus]: You can actually get a lot of useful cheap really almost new products that way, with only a little risk of scams.
[ambrus]: That's what some of the "Black Friday" sales are about.
[Corion]: ambrus: Well, usually, these people don't have in their description "mail me at dodgy_reseller # g m a i l | co m" , replace the "#" by "@" :)
[Corion]: Oh, and the "o" in "com" is a zero
choroba orders a camera from Ole Scæmmer
[ambrus]: Corion: ah. that's different. the ones I mean are selling at reputable sites like ebay that usually filters scammers out pretty quickly (as well as filters a lot of legitimate users who then get annoyed that the biggest providers exclude them)

How do I use this? | Other CB clients
Other Users?
Others exploiting the Monastery: (13)
As of 2017-11-21 15:04 GMT
Find Nodes?
    Voting Booth?
    In order to be able to say "I know Perl", you must have:

    Results (304 votes). Check out past polls.