Beefy Boxes and Bandwidth Generously Provided by pair Networks
Just another Perl shrine

Pattern match for split() - need match but not match syntax

by Tanoti (Initiate)
on May 06, 2008 at 14:02 UTC ( #684949=perlquestion: print w/replies, xml ) Need Help??
Tanoti has asked for the wisdom of the Perl Monks concerning the following question:

I have some text from a third-party app which I'm storing in a single variable that needs parsing. I am using: split /\n/, $app_text; to break it into lines for processing. I'm looking for "Field:Value" lines and ignoring everything else. For most of the text this is fine, however the external app is borking some of the fields and putting a \n after the colon meaning the Value for that field ends up in the next array element. Here's some sample code:
#!/usr/bin/perl use strict; my $app_text = "one:partridge\ntwo:\nturtle doves\nthree:french hens\n +"; foreach my $line (split /\n/, $app_text) { print "$line\n"; }

one:partridge two: turtle doves three:french hens

How can I tell split to split on the \n but not if preceeded by a colon, so I get two:turtle doves for the second array element in the above example?

Many thanks,

Replies are listed 'Best First'.
Re: Pattern match for split() - need match but not match syntax
by citromatik (Curate) on May 06, 2008 at 14:11 UTC

    You are almost there, include the condition inside the split pattern:

    use strict; my $app_text = "one:partridge\ntwo:\nturtle doves\nthree:french hens\n +"; foreach my $line (split /(?<!:)\n/, $app_text) { $line =~ s/\n//g; # Eliminate internal "\n"s print "$line\n"; }


    one:partridge two:turtle doves three:french hens

    Update: Corrected the split pattern to use lookbehinds, see perlre


      Thanks, that works a treat and has saved a lot of case-specific workaround code. I had played with the lookbehind syntax but couldn't get them to work so thought I was on the wrong track!

Re: Pattern match for split() - need match but not match syntax
by Narveson (Chaplain) on May 06, 2008 at 14:59 UTC

    As long as you have to think about regexes anyway, you can use a regex that parses your text at the same time that it's splitting it.

    my $LINE_PATTERN = qr{ ([^:]+) # capture everything before ... :\s* # the colon and any newline or other whitespace, ([^\n]+) # then capture everything before \n # the next newline }msx; my $app_text = "one:partridge\ntwo:\nturtle doves\nthree:french hens\n +"; while ($app_text =~ /$LINE_PATTERN/g) { print "$1: $2\n"; }

    If you were planning to put the fields in a hash, you can do it all at once:

    my %value_of = $app_text =~ /$LINE_PATTERN/g; while (my ($field, $value) = each %value_of) { print "$field: $value\n"; }
      While there's nothing wrong with your $LINE_PATTERN regex I think it would be simpler to keep the record and field/value processing separate. To my eye it looks tidier and easier to maintain but others may disagree.

      use strict; use warnings; use Data::Dumper; my $app_text = qq{one:partridge\ntwo:\nturtle doves\nthree:french hens\n}; my %fvPairs = map { split m{:\n?} } map { split m{(?<!:)\n} } $app_text; print Data::Dumper->Dumpxs( [ \ %fvPairs], [ q{*fvPairs} ] );

      produces ...

      %fvPairs = ( 'three' => 'french hens', 'one' => 'partridge', 'two' => 'turtle doves' );



Re: Pattern match for split() - need match but not match syntax
by GrandFather (Sage) on May 06, 2008 at 22:23 UTC

    Would you perhaps be better using Text::xSV or Text::CSV to be doing the parsing for you?

    Perl is environmentally friendly - it saves trees

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://684949]
Approved by moritz
[shmem]: It's common for some vendors to have column names such as WRSTVG or some other such whizzbang, and another table where these names are mapped to something meaningful depending on how you look at the data
[shmem]: afair in SAP that occurs all the time
[shmem]: afair in SAP that sort of indirection is sprinkled all over the database (for hysterical raisins)

How do I use this? | Other CB clients
Other Users?
Others taking refuge in the Monastery: (12)
As of 2017-05-25 13:41 GMT
Find Nodes?
    Voting Booth?