Beefy Boxes and Bandwidth Generously Provided by pair Networks
There's more than one way to do things

Re-reading history from 2001 / using a capture during split

by talexb (Canon)
on Mar 05, 2019 at 16:22 UTC ( #1230912=perlquestion: print w/replies, xml ) Need Help??

talexb has asked for the wisdom of the Perl Monks concerning the following question:

I've been part of this community for a while, and it's always fun to check out Selected Best Nodes for pearls (heh) of wisdom. Today I found Being pretentious, and getting away with it., which talked about the behaviour of split with a particular regex.

I started writing a module to beautify code recently, and ended up using a capture in a split statement -- which isn't something I do regularly. Here's the interesting part about this subject (finally) .. the example in the node talked about the following behaviour:

    In Perl 5.005_02, split /(A)|B/, "1A2B3" returned a five-element list of (1, 'A', 2, undef, 3). In 5.005_03, it returned (1, 'A', 2, '', 3); a subtle, but meaningful, difference. There's only one way to get an undef from split(), and that's from the underlying regex match. A capturing paren that does not match has undef as its $DIGIT value.
    A tiny bit of incorrectly written code in pp.c:pp_split() caused these undefs to become empty strings. Bah. I corrected it, documented it, and tested it.
Cool, I thought, and ran off to test it in the debugger.
DB<2> @foo = split /(A)|B/, "1A2B3" DB<3> x @foo 0 1 1 'A' 2 2 3 undef 4 3 DB<4>

The behaviour has gone back to the previous one. I believe it's because split is splitting on the un-captured 'B', and since it's not captured, undef is the correct result, rather than a null string (with apologies to japhy.


Alex / talexb / Toronto

Thanks PJ. We owe you so much. Groklaw -- RIP -- 2003 to 2013.

EDIT: Updated title to add '/ using a capture during split' to the title, as per feedback.

Replies are listed 'Best First'.
Re: Re-reading history from 2001
by vr (Deacon) on Mar 05, 2019 at 16:56 UTC
    split is splitting on the un-captured 'B', and since it's not captured...

    No, it's the (A) that's not captured. Every group produces a field, as last paragraph in split explains.

    If the PATTERN contains capturing groups, then for each separator, an additional field is produced for each substring captured by a group (in the order in which the groups are specified, as per backreferences); if any group does not match, then it captures the undef value instead of a substring
    DB<3> @foo = split /(A)|(B)/, "1A2B3" DB<4> x @foo 0 1 1 'A' 2 undef 3 2 4 undef 5 'B' 6 3

    I'm reading linked node as japhy apologizes for what he did (i.e. making "split" to return empty string), and acknowledges that returning "undef" was correct from the very begining.

      Yes, and to get rid of the undefs, one can use the branch reset pattern:
      my @foo = split /(?|(A)|(B))/, "1A2B3"; use Data::Dumper; print Dumper \@foo;
      $VAR1 = [ '1', 'A', '2', 'B', '3' ];
      map{substr$_->[0],$_->[1]||0,1}[\*||{},3],[[]],[ref qr-1,-,-1],[{}],[sub{}^*ARGV,3]

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://1230912]
Approved by marto
Front-paged by marto
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others chilling in the Monastery: (4)
As of 2019-07-18 23:23 GMT
Find Nodes?
    Voting Booth?

    No recent polls found