Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl Monk, Perl Meditation
 
PerlMonks  

Trouble splitting pipe delimited

by kpiti (Novice)
on Nov 05, 2012 at 00:34 UTC ( #1002256=perlquestion: print w/ replies, xml ) Need Help??
kpiti has asked for the wisdom of the Perl Monks concerning the following question:

Hi, I need to split | delimeted line but with escaping potential \| strings. I'm playing with Text::Balanced but I'm not getting the right results. Say I have a line like
1|str|foo\|bar|goo|2323
I need it to be split into the following fields:
1,str,foo\|bar,goo,2323

The code:

use Data::Dumper; use Text::Balanced qw/extract_multiple extract_delimited/; $x=q[1|str|foo\|bar|goo|2323]; @a=extract_multiple($x, [sub {extract_delimited($_[0],qq{\|} )}], undef, 0); print Dumper(@a);
will return
$VAR1 = '1'; $VAR2 = '|str|'; $VAR3 = 'foo\\'; $VAR4 = '|bar|'; $VAR5 = 'goo|2323';
which is way wrong. And plain split will split on \| as well as plain |..
Any hints on the right (or smart, or both :) way to do this?

Comment on Trouble splitting pipe delimited
Select or Download Code
Replies are listed 'Best First'.
Re: Trouble splitting pipe delimited
by Tanktalus (Canon) on Nov 05, 2012 at 04:40 UTC

    I'll second the anonymous monk's suggestion to use Text::CSV (or, better, Text::CSV_XS):

    use Text::CSV_XS; use feature 'say'; use warnings; my @tests = ( [q[1|str|foo\\|bar|goo|2323] => [qw(1 str foo|bar goo 232 +3)]], [q[foo\\\\|bar] => [qw(foo\\ bar)]], ); for my $t (@tests) { my $s = $t->[0]; my @e = @{$t->[1]}; my $csv = Text::CSV_XS->new( { sep_char => '|', escape_char => '\\', } ) or die "" . Text::CSV_XS->error_diag( +); $csv->parse($s); my @o = $csv->fields(); say "$s\n @e\n @o\n"; }
    The regex/split suggestions fail to take into consideration a doubled escape prior to a delimiter (see the second test). For your example, that may be fine, but your example is so obviously contrived that it can't be taken definitively.

    Output:

    $ perl5.16.2 x.pl 1|str|foo\|bar|goo|2323 1 str foo|bar goo 2323 1 str foo|bar goo 2323 foo\\|bar foo\ bar foo\ bar
    Hope that helps.

      Thanks for describing some of the "complex cases" that I dismissed so casually.
      Bill
Re: Trouble splitting pipe delimited
by hbm (Hermit) on Nov 05, 2012 at 00:42 UTC
    perl -E "say for split/(?<!\\)\|/,'1|str|foo\|bar|goo|2323'" 1 str foo\|bar goo 2323
Re: Trouble splitting pipe delimited
by Anonymous Monk on Nov 05, 2012 at 01:09 UTC
Re: Trouble splitting pipe delimited
by BillKSmith (Deacon) on Nov 05, 2012 at 02:41 UTC

    The module may be needed for more complex cases, but split can handle this one. Note use of negative look behind in the regular expression.

    $_ = '1|str|foo\|bar|goo|2323'; $, = ','; print split /(?<!\\)\|/;

    Bill
Re: Trouble splitting pipe delimited
by kpiti (Novice) on Nov 05, 2012 at 08:14 UTC
    Thanks a lot to all, this definitely solves my problem. I think I'll go with Text::CSV as being more generic but the zen regexp is also something I couldn't figure out myself. Will contemplate on that one some more as well :)

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://1002256]
Approved by Athanasius
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others contemplating the Monastery: (19)
As of 2015-07-29 13:56 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    The top three priorities of my open tasks are (in descending order of likelihood to be worked on) ...









    Results (263 votes), past polls