Beefy Boxes and Bandwidth Generously Provided by pair Networks RobOMonk
Just another Perl shrine
 
PerlMonks  

Trouble splitting pipe delimited

by kpiti (Novice)
on Nov 05, 2012 at 00:34 UTC ( #1002256=perlquestion: print w/ replies, xml ) Need Help??
kpiti has asked for the wisdom of the Perl Monks concerning the following question:

Hi, I need to split | delimeted line but with escaping potential \| strings. I'm playing with Text::Balanced but I'm not getting the right results. Say I have a line like
1|str|foo\|bar|goo|2323
I need it to be split into the following fields:
1,str,foo\|bar,goo,2323

The code:

use Data::Dumper; use Text::Balanced qw/extract_multiple extract_delimited/; $x=q[1|str|foo\|bar|goo|2323]; @a=extract_multiple($x, [sub {extract_delimited($_[0],qq{\|} )}], undef, 0); print Dumper(@a);
will return
$VAR1 = '1'; $VAR2 = '|str|'; $VAR3 = 'foo\\'; $VAR4 = '|bar|'; $VAR5 = 'goo|2323';
which is way wrong. And plain split will split on \| as well as plain |..
Any hints on the right (or smart, or both :) way to do this?

Comment on Trouble splitting pipe delimited
Select or Download Code
Re: Trouble splitting pipe delimited
by hbm (Hermit) on Nov 05, 2012 at 00:42 UTC
    perl -E "say for split/(?<!\\)\|/,'1|str|foo\|bar|goo|2323'" 1 str foo\|bar goo 2323
Re: Trouble splitting pipe delimited
by Anonymous Monk on Nov 05, 2012 at 01:09 UTC
Re: Trouble splitting pipe delimited
by BillKSmith (Hermit) on Nov 05, 2012 at 02:41 UTC

    The module may be needed for more complex cases, but split can handle this one. Note use of negative look behind in the regular expression.

    $_ = '1|str|foo\|bar|goo|2323'; $, = ','; print split /(?<!\\)\|/;

    Bill
Re: Trouble splitting pipe delimited
by Tanktalus (Canon) on Nov 05, 2012 at 04:40 UTC

    I'll second the anonymous monk's suggestion to use Text::CSV (or, better, Text::CSV_XS):

    use Text::CSV_XS; use feature 'say'; use warnings; my @tests = ( [q[1|str|foo\\|bar|goo|2323] => [qw(1 str foo|bar goo 232 +3)]], [q[foo\\\\|bar] => [qw(foo\\ bar)]], ); for my $t (@tests) { my $s = $t->[0]; my @e = @{$t->[1]}; my $csv = Text::CSV_XS->new( { sep_char => '|', escape_char => '\\', } ) or die "" . Text::CSV_XS->error_diag( +); $csv->parse($s); my @o = $csv->fields(); say "$s\n @e\n @o\n"; }
    The regex/split suggestions fail to take into consideration a doubled escape prior to a delimiter (see the second test). For your example, that may be fine, but your example is so obviously contrived that it can't be taken definitively.

    Output:

    $ perl5.16.2 x.pl 1|str|foo\|bar|goo|2323 1 str foo|bar goo 2323 1 str foo|bar goo 2323 foo\\|bar foo\ bar foo\ bar
    Hope that helps.

      Thanks for describing some of the "complex cases" that I dismissed so casually.
      Bill
Re: Trouble splitting pipe delimited
by kpiti (Novice) on Nov 05, 2012 at 08:14 UTC
    Thanks a lot to all, this definitely solves my problem. I think I'll go with Text::CSV as being more generic but the zen regexp is also something I couldn't figure out myself. Will contemplate on that one some more as well :)

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://1002256]
Approved by Athanasius
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others avoiding work at the Monastery: (5)
As of 2014-04-20 10:54 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    April first is:







    Results (485 votes), past polls