Pattern match for split() - need match but not match syntax

Tanoti has asked for the wisdom of the Perl Monks concerning the following question:

I have some text from a third-party app which I'm storing in a single variable that needs parsing. I am using: split /\n/, $app_text; to break it into lines for processing. I'm looking for "Field:Value" lines and ignoring everything else. For most of the text this is fine, however the external app is borking some of the fields and putting a \n after the colon meaning the Value for that field ends up in the next array element. Here's some sample code:

#!/usr/bin/perl

use strict;

my $app_text = "one:partridge\ntwo:\nturtle doves\nthree:french hens\n
+";

foreach my $line (split /\n/, $app_text) {
    print "$line\n";
}
[download]

Produces:

one:partridge
two:
turtle doves
three:french hens
[download]

How can I tell split to split on the \n but not if preceeded by a colon, so I get two:turtle doves for the second array element in the above example?

Many thanks,
John

Comment on Pattern match for split() - need match but not match syntax Select or Download Code

Replies are listed 'Best First'.
Re: Pattern match for split() - need match but not match syntax by citromatik (Curate) on May 06, 2008 at 14:11 UTC
You are almost there, include the condition inside the split pattern: `use strict; my $app_text = "one:partridge\ntwo:\nturtle doves\nthree:french hens\n +"; foreach my $line (split /(?<!:)\n/, $app_text) { $line =~ s/\n//g; # Eliminate internal "\n"s print "$line\n"; }` [download] Outputs `one:partridge two:turtle doves three:french hens` [download] Update: Corrected the split pattern to use lookbehinds, see perlre citromatik	[reply] [d/l] [select]
Re^2: Pattern match for split() - need match but not match syntax by Tanoti (Initiate) on May 06, 2008 at 14:52 UTC
Thanks, that works a treat and has saved a lot of case-specific workaround code. I had played with the lookbehind syntax but couldn't get them to work so thought I was on the wrong track! John	[reply]
Re: Pattern match for split() - need match but not match syntax by Narveson (Chaplain) on May 06, 2008 at 14:59 UTC
As long as you have to think about regexes anyway, you can use a regex that parses your text at the same time that it's splitting it. `my $LINE_PATTERN = qr{ ([^:]+) # capture everything before ... :\s* # the colon and any newline or other whitespace, ([^\n]+) # then capture everything before \n # the next newline }msx; my $app_text = "one:partridge\ntwo:\nturtle doves\nthree:french hens\n +"; while ($app_text =~ /$LINE_PATTERN/g) { print "$1: $2\n"; }` [download] If you were planning to put the fields in a hash, you can do it all at once: `my %value_of = $app_text =~ /$LINE_PATTERN/g; while (my ($field, $value) = each %value_of) { print "$field: $value\n"; }` [download]	[reply] [d/l] [select]
Re^2: Pattern match for split() - need match but not match syntax by johngg (Canon) on May 06, 2008 at 18:15 UTC
While there's nothing wrong with your `$LINE_PATTERN` regex I think it would be simpler to keep the record and field/value processing separate. To my eye it looks tidier and easier to maintain but others may disagree. `use strict; use warnings; use Data::Dumper; my $app_text = qq{one:partridge\ntwo:\nturtle doves\nthree:french hens\n}; my %fvPairs = map { split m{:\n?} } map { split m{(?<!:)\n} } $app_text; print Data::Dumper->Dumpxs( [ \ %fvPairs], [ q{*fvPairs} ] );` [download] produces ... `%fvPairs = ( 'three' => 'french hens', 'one' => 'partridge', 'two' => 'turtle doves' );` [download] Cheers, JohnGG	[reply] [d/l] [select]
Re: Pattern match for split() - need match but not match syntax by GrandFather (Saint) on May 06, 2008 at 22:23 UTC
Would you perhaps be better using Text::xSV or Text::CSV to be doing the parsing for you? Perl is environmentally friendly - it saves trees	[reply]


Think about Loose Coupling
	PerlMonks