Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl Monk, Perl Meditation

Zero-width assertions fail with split

by Ovid (Cardinal)
on Sep 19, 2005 at 04:53 UTC ( #493078=perlquestion: print w/replies, xml ) Need Help??
Ovid has asked for the wisdom of the Perl Monks concerning the following question:

It's late and I'm tired, but I'm banging my head against a silly little problem. It's a long story as to why I need this, but I've stumbled on the following curious problem:

#!/usr/bin/perl use strict; use warnings; use Test::More qw/no_plan/; my $token = '-----'; my $data = "1,2,${token}0${token},4,5"; my $split = qr/(?<!$token),(?!$token)/; my @fields = split $split, $data; my @expected = ( 1,2, "${token}0${token}", 4, 5 ); is_deeply \@fields, \@expected;

I need to be able to split on a value if and only if that value is not immediately preceded and followed by the same fixed-width string. The code above actually assigns the following to @fields:

@fields = ( '1', '2,-----0-----,4', '5' );

It's as if it's doing a logical OR with the two zero-width assertions instead of a logical AND. I suppose it's not unreasonable that it do this, but is there some simple way I can enforce my desired behavior of splitting the resulting string to this?

@fields = ( '1', '2', '-----0-----', '4', '5' );

Also, where is this behavior documented? I'm sure it is somewhere. I just can't find it.


New address of my CGI Course.

Replies are listed 'Best First'.
Re: Zero-width assertions fail with split
by merlyn (Sage) on Sep 19, 2005 at 05:00 UTC
    De Morgan is kicking your hiney.

    It's working as I would expect. You are asking to split on commas that have neither dashes before NOR dashes afterwards, as in "the delimiter is NOT dashes ahead AND a comma AND NOT dashes afterwards". That's the only time the split regex would match. Dashes on either side would prevent the comma from being considered. Your experimental results are borne out.

    And simply splitting on commas would give you the list you want, so you'll have to show an example where simply splitting on commas doesn't do it.

    -- Randal L. Schwartz, Perl hacker
    Be sure to read my standard disclaimer if this is a reply.

      That makes sense and, as noted, the results aren't entirely suprising, though I confess I struggled with it. Is there some simple way of getting the split behavior to do what I want?


      New address of my CGI Course.

        You want "not both preceded and followed by TOKEN", which is equivalent to "either not preceded by TOKEN, or not followed by TOKEN", and that we can do:

        my $split = qr/ (?<!$token) , | , (?!$token) /x;



        It may be silly, but, doesn't this do what you want?

        perl -e '$a = "1,2,---0---,3,4";@f=split ",",$a;map { print $_."\n"} @f'
        if ( 1 ) { $postman->ring() for (1..2); }
Re: Zero-width assertions fail with split
by BrowserUk (Pope) on Sep 19, 2005 at 07:24 UTC

    How's this?

    #! perl -slw use strict; my $test = 'xxx,xyyy,zzz,zzz,ppp,qqq,rrr'; my $re = qr[ (?<!zzz),(?!zzz) # a comma neither preceeded nor followed by the +token. | (?<=zzz),(?!zzz) # or predeeded but not followed | (?<!zzz),(?=zzz) # or not preceeded but followed. ]x; print for split $re, $test; __END__ P:\test>junk xxx xyyy zzz,zzz ppp qqq rrr

    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    Lingua non convalesco, consenesco et abolesco. -- Rule 1 has a caveat! -- Who broke the cabal?
    "Science is about questioning the status quo. Questioning authority".
    The "good enough" maybe good enough for the now, and perfection maybe unobtainable, but that should not preclude us from striving for perfection, when time, circumstance or desire allow.
Re: Zero-width assertions fail with split
by sk (Curate) on Sep 19, 2005 at 06:50 UTC
    I think your -----0----- led to everyone suggest just split on , but i think you had -----,----- in mind.

    Here is my take at this but does not do it in one split and not sure if it is even possible!

    #!/usr/bin/perl use strict; use warnings; use Data::Dumper; my $token = '-----'; my $data = "1,2,${token}0${token},4,5"; my $split = qr/(?<=$token),(?=$token)/; my @fields = split $split, $data; my @newfields = (); my $i = 0; while ($i < @fields/2) { my @f1 = split /,/,$fields[$i]; my @f2 = split /,/,$fields[$i+1]; my @dash = (pop(@f1), shift(@f2)); @newfields = (@newfields, @f1,join (',',@dash),@f2); $i += 2; } my @expected = ( 1,2, "${token}0${token}", 4, 5 ); print Dumper (\@newfields); print Dumper (\@expected); __END__ $VAR1 = [ '1', '2', '-----0-----', '4', '5,' ]; $VAR1 = [ 1, 2, '-----0-----', 4, 5 ];

    if you pass in my $data   = "1,2,${token},${token},4,5";


    $VAR1 = [ '1', '2', '-----,-----', '4', '5' ];

Re: Zero-width assertions fail with split
by demerphq (Chancellor) on Sep 19, 2005 at 06:30 UTC

    This behaviour is as I expect. Split on any comma not preceded by '----' and not followed by '----'.

    To get the desired output simply split on a comma.

    I think the problem here is in your expectation and not in the code itself.


Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://493078]
Approved by spiritway
Front-paged by broquaint
and all is quiet...

How do I use this? | Other CB clients
Other Users?
Others chilling in the Monastery: (3)
As of 2018-05-20 18:14 GMT
Find Nodes?
    Voting Booth?