Beefy Boxes and Bandwidth Generously Provided by pair Networks
Pathologically Eclectic Rubbish Lister

Trouble splitting pipe delimited

by kpiti (Novice)
on Nov 05, 2012 at 00:34 UTC ( #1002256=perlquestion: print w/replies, xml ) Need Help??
kpiti has asked for the wisdom of the Perl Monks concerning the following question:

Hi, I need to split | delimeted line but with escaping potential \| strings. I'm playing with Text::Balanced but I'm not getting the right results. Say I have a line like
I need it to be split into the following fields:

The code:

use Data::Dumper; use Text::Balanced qw/extract_multiple extract_delimited/; $x=q[1|str|foo\|bar|goo|2323]; @a=extract_multiple($x, [sub {extract_delimited($_[0],qq{\|} )}], undef, 0); print Dumper(@a);
will return
$VAR1 = '1'; $VAR2 = '|str|'; $VAR3 = 'foo\\'; $VAR4 = '|bar|'; $VAR5 = 'goo|2323';
which is way wrong. And plain split will split on \| as well as plain |..
Any hints on the right (or smart, or both :) way to do this?

Replies are listed 'Best First'.
Re: Trouble splitting pipe delimited
by Tanktalus (Canon) on Nov 05, 2012 at 04:40 UTC

    I'll second the anonymous monk's suggestion to use Text::CSV (or, better, Text::CSV_XS):

    use Text::CSV_XS; use feature 'say'; use warnings; my @tests = ( [q[1|str|foo\\|bar|goo|2323] => [qw(1 str foo|bar goo 232 +3)]], [q[foo\\\\|bar] => [qw(foo\\ bar)]], ); for my $t (@tests) { my $s = $t->[0]; my @e = @{$t->[1]}; my $csv = Text::CSV_XS->new( { sep_char => '|', escape_char => '\\', } ) or die "" . Text::CSV_XS->error_diag( +); $csv->parse($s); my @o = $csv->fields(); say "$s\n @e\n @o\n"; }
    The regex/split suggestions fail to take into consideration a doubled escape prior to a delimiter (see the second test). For your example, that may be fine, but your example is so obviously contrived that it can't be taken definitively.


    $ perl5.16.2 1|str|foo\|bar|goo|2323 1 str foo|bar goo 2323 1 str foo|bar goo 2323 foo\\|bar foo\ bar foo\ bar
    Hope that helps.

      Thanks for describing some of the "complex cases" that I dismissed so casually.
Re: Trouble splitting pipe delimited
by hbm (Hermit) on Nov 05, 2012 at 00:42 UTC
    perl -E "say for split/(?<!\\)\|/,'1|str|foo\|bar|goo|2323'" 1 str foo\|bar goo 2323
Re: Trouble splitting pipe delimited
by Anonymous Monk on Nov 05, 2012 at 01:09 UTC
Re: Trouble splitting pipe delimited
by BillKSmith (Vicar) on Nov 05, 2012 at 02:41 UTC

    The module may be needed for more complex cases, but split can handle this one. Note use of negative look behind in the regular expression.

    $_ = '1|str|foo\|bar|goo|2323'; $, = ','; print split /(?<!\\)\|/;

Re: Trouble splitting pipe delimited
by kpiti (Novice) on Nov 05, 2012 at 08:14 UTC
    Thanks a lot to all, this definitely solves my problem. I think I'll go with Text::CSV as being more generic but the zen regexp is also something I couldn't figure out myself. Will contemplate on that one some more as well :)

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://1002256]
Approved by Athanasius
and all is quiet...

How do I use this? | Other CB clients
Other Users?
Others wandering the Monastery: (6)
As of 2018-02-22 00:02 GMT
Find Nodes?
    Voting Booth?
    When it is dark outside I am happiest to see ...

    Results (288 votes). Check out past polls.