Beefy Boxes and Bandwidth Generously Provided by pair Networks
Problems? Is your data what you think it is?

Trouble splitting pipe delimited

by kpiti (Novice)
on Nov 05, 2012 at 00:34 UTC ( #1002256=perlquestion: print w/replies, xml ) Need Help??
kpiti has asked for the wisdom of the Perl Monks concerning the following question:

Hi, I need to split | delimeted line but with escaping potential \| strings. I'm playing with Text::Balanced but I'm not getting the right results. Say I have a line like
I need it to be split into the following fields:

The code:

use Data::Dumper; use Text::Balanced qw/extract_multiple extract_delimited/; $x=q[1|str|foo\|bar|goo|2323]; @a=extract_multiple($x, [sub {extract_delimited($_[0],qq{\|} )}], undef, 0); print Dumper(@a);
will return
$VAR1 = '1'; $VAR2 = '|str|'; $VAR3 = 'foo\\'; $VAR4 = '|bar|'; $VAR5 = 'goo|2323';
which is way wrong. And plain split will split on \| as well as plain |..
Any hints on the right (or smart, or both :) way to do this?

Replies are listed 'Best First'.
Re: Trouble splitting pipe delimited
by Tanktalus (Canon) on Nov 05, 2012 at 04:40 UTC

    I'll second the anonymous monk's suggestion to use Text::CSV (or, better, Text::CSV_XS):

    use Text::CSV_XS; use feature 'say'; use warnings; my @tests = ( [q[1|str|foo\\|bar|goo|2323] => [qw(1 str foo|bar goo 232 +3)]], [q[foo\\\\|bar] => [qw(foo\\ bar)]], ); for my $t (@tests) { my $s = $t->[0]; my @e = @{$t->[1]}; my $csv = Text::CSV_XS->new( { sep_char => '|', escape_char => '\\', } ) or die "" . Text::CSV_XS->error_diag( +); $csv->parse($s); my @o = $csv->fields(); say "$s\n @e\n @o\n"; }
    The regex/split suggestions fail to take into consideration a doubled escape prior to a delimiter (see the second test). For your example, that may be fine, but your example is so obviously contrived that it can't be taken definitively.


    $ perl5.16.2 1|str|foo\|bar|goo|2323 1 str foo|bar goo 2323 1 str foo|bar goo 2323 foo\\|bar foo\ bar foo\ bar
    Hope that helps.

      Thanks for describing some of the "complex cases" that I dismissed so casually.
Re: Trouble splitting pipe delimited
by hbm (Hermit) on Nov 05, 2012 at 00:42 UTC
    perl -E "say for split/(?<!\\)\|/,'1|str|foo\|bar|goo|2323'" 1 str foo\|bar goo 2323
Re: Trouble splitting pipe delimited
by Anonymous Monk on Nov 05, 2012 at 01:09 UTC
Re: Trouble splitting pipe delimited
by BillKSmith (Priest) on Nov 05, 2012 at 02:41 UTC

    The module may be needed for more complex cases, but split can handle this one. Note use of negative look behind in the regular expression.

    $_ = '1|str|foo\|bar|goo|2323'; $, = ','; print split /(?<!\\)\|/;

Re: Trouble splitting pipe delimited
by kpiti (Novice) on Nov 05, 2012 at 08:14 UTC
    Thanks a lot to all, this definitely solves my problem. I think I'll go with Text::CSV as being more generic but the zen regexp is also something I couldn't figure out myself. Will contemplate on that one some more as well :)

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://1002256]
Approved by Athanasius
and all is quiet...

How do I use this? | Other CB clients
Other Users?
Others perusing the Monastery: (2)
As of 2017-06-25 02:24 GMT
Find Nodes?
    Voting Booth?
    How many monitors do you use while coding?

    Results (564 votes). Check out past polls.