http://www.perlmonks.org?node_id=1002256

kpiti has asked for the wisdom of the Perl Monks concerning the following question:

Hi, I need to split | delimeted line but with escaping potential \| strings. I'm playing with Text::Balanced but I'm not getting the right results. Say I have a line like
1|str|foo\|bar|goo|2323
I need it to be split into the following fields:
1,str,foo\|bar,goo,2323

The code:

use Data::Dumper; use Text::Balanced qw/extract_multiple extract_delimited/; $x=q[1|str|foo\|bar|goo|2323]; @a=extract_multiple($x, [sub {extract_delimited($_[0],qq{\|} )}], undef, 0); print Dumper(@a);
will return
$VAR1 = '1'; $VAR2 = '|str|'; $VAR3 = 'foo\\'; $VAR4 = '|bar|'; $VAR5 = 'goo|2323';
which is way wrong. And plain split will split on \| as well as plain |..
Any hints on the right (or smart, or both :) way to do this?

Replies are listed 'Best First'.
Re: Trouble splitting pipe delimited
by Tanktalus (Canon) on Nov 05, 2012 at 04:40 UTC

    I'll second the anonymous monk's suggestion to use Text::CSV (or, better, Text::CSV_XS):

    use Text::CSV_XS; use feature 'say'; use warnings; my @tests = ( [q[1|str|foo\\|bar|goo|2323] => [qw(1 str foo|bar goo 232 +3)]], [q[foo\\\\|bar] => [qw(foo\\ bar)]], ); for my $t (@tests) { my $s = $t->[0]; my @e = @{$t->[1]}; my $csv = Text::CSV_XS->new( { sep_char => '|', escape_char => '\\', } ) or die "" . Text::CSV_XS->error_diag( +); $csv->parse($s); my @o = $csv->fields(); say "$s\n @e\n @o\n"; }
    The regex/split suggestions fail to take into consideration a doubled escape prior to a delimiter (see the second test). For your example, that may be fine, but your example is so obviously contrived that it can't be taken definitively.

    Output:

    $ perl5.16.2 x.pl 1|str|foo\|bar|goo|2323 1 str foo|bar goo 2323 1 str foo|bar goo 2323 foo\\|bar foo\ bar foo\ bar
    Hope that helps.

      Thanks for describing some of the "complex cases" that I dismissed so casually.
      Bill
Re: Trouble splitting pipe delimited
by hbm (Hermit) on Nov 05, 2012 at 00:42 UTC
    perl -E "say for split/(?<!\\)\|/,'1|str|foo\|bar|goo|2323'" 1 str foo\|bar goo 2323
Re: Trouble splitting pipe delimited
by Anonymous Monk on Nov 05, 2012 at 01:09 UTC
Re: Trouble splitting pipe delimited
by BillKSmith (Monsignor) on Nov 05, 2012 at 02:41 UTC

    The module may be needed for more complex cases, but split can handle this one. Note use of negative look behind in the regular expression.

    $_ = '1|str|foo\|bar|goo|2323'; $, = ','; print split /(?<!\\)\|/;

    Bill
Re: Trouble splitting pipe delimited
by kpiti (Novice) on Nov 05, 2012 at 08:14 UTC
    Thanks a lot to all, this definitely solves my problem. I think I'll go with Text::CSV as being more generic but the zen regexp is also something I couldn't figure out myself. Will contemplate on that one some more as well :)