Beefy Boxes and Bandwidth Generously Provided by pair Networks
Think about Loose Coupling
 
PerlMonks  

Pattern match for split() - need match but not match syntax

by Tanoti (Initiate)
on May 06, 2008 at 14:02 UTC ( #684949=perlquestion: print w/ replies, xml ) Need Help??
Tanoti has asked for the wisdom of the Perl Monks concerning the following question:

I have some text from a third-party app which I'm storing in a single variable that needs parsing. I am using: split /\n/, $app_text; to break it into lines for processing. I'm looking for "Field:Value" lines and ignoring everything else. For most of the text this is fine, however the external app is borking some of the fields and putting a \n after the colon meaning the Value for that field ends up in the next array element. Here's some sample code:
#!/usr/bin/perl use strict; my $app_text = "one:partridge\ntwo:\nturtle doves\nthree:french hens\n +"; foreach my $line (split /\n/, $app_text) { print "$line\n"; }

Produces:
one:partridge two: turtle doves three:french hens

How can I tell split to split on the \n but not if preceeded by a colon, so I get two:turtle doves for the second array element in the above example?

Many thanks,
John

Comment on Pattern match for split() - need match but not match syntax
Select or Download Code
Replies are listed 'Best First'.
Re: Pattern match for split() - need match but not match syntax
by citromatik (Curate) on May 06, 2008 at 14:11 UTC

    You are almost there, include the condition inside the split pattern:

    use strict; my $app_text = "one:partridge\ntwo:\nturtle doves\nthree:french hens\n +"; foreach my $line (split /(?<!:)\n/, $app_text) { $line =~ s/\n//g; # Eliminate internal "\n"s print "$line\n"; }

    Outputs

    one:partridge two:turtle doves three:french hens

    Update: Corrected the split pattern to use lookbehinds, see perlre

    citromatik

      Thanks, that works a treat and has saved a lot of case-specific workaround code. I had played with the lookbehind syntax but couldn't get them to work so thought I was on the wrong track!

      John
Re: Pattern match for split() - need match but not match syntax
by Narveson (Chaplain) on May 06, 2008 at 14:59 UTC

    As long as you have to think about regexes anyway, you can use a regex that parses your text at the same time that it's splitting it.

    my $LINE_PATTERN = qr{ ([^:]+) # capture everything before ... :\s* # the colon and any newline or other whitespace, ([^\n]+) # then capture everything before \n # the next newline }msx; my $app_text = "one:partridge\ntwo:\nturtle doves\nthree:french hens\n +"; while ($app_text =~ /$LINE_PATTERN/g) { print "$1: $2\n"; }

    If you were planning to put the fields in a hash, you can do it all at once:

    my %value_of = $app_text =~ /$LINE_PATTERN/g; while (my ($field, $value) = each %value_of) { print "$field: $value\n"; }
      While there's nothing wrong with your $LINE_PATTERN regex I think it would be simpler to keep the record and field/value processing separate. To my eye it looks tidier and easier to maintain but others may disagree.

      use strict; use warnings; use Data::Dumper; my $app_text = qq{one:partridge\ntwo:\nturtle doves\nthree:french hens\n}; my %fvPairs = map { split m{:\n?} } map { split m{(?<!:)\n} } $app_text; print Data::Dumper->Dumpxs( [ \ %fvPairs], [ q{*fvPairs} ] );

      produces ...

      %fvPairs = ( 'three' => 'french hens', 'one' => 'partridge', 'two' => 'turtle doves' );

      Cheers,

      JohnGG

Re: Pattern match for split() - need match but not match syntax
by GrandFather (Sage) on May 06, 2008 at 22:23 UTC

    Would you perhaps be better using Text::xSV or Text::CSV to be doing the parsing for you?


    Perl is environmentally friendly - it saves trees

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://684949]
Approved by moritz
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others lurking in the Monastery: (16)
As of 2015-07-07 17:39 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    The top three priorities of my open tasks are (in descending order of likelihood to be worked on) ...









    Results (93 votes), past polls