http://www.perlmonks.org?node_id=1230912

talexb has asked for the wisdom of the Perl Monks concerning the following question:

I've been part of this community for a while, and it's always fun to check out Selected Best Nodes for pearls (heh) of wisdom. Today I found Being pretentious, and getting away with it., which talked about the behaviour of split with a particular regex.

I started writing a module to beautify code recently, and ended up using a capture in a split statement -- which isn't something I do regularly. Here's the interesting part about this subject (finally) .. the example in the node talked about the following behaviour:

Cool, I thought, and ran off to test it in the debugger.
DB<2> @foo = split /(A)|B/, "1A2B3" DB<3> x @foo 0 1 1 'A' 2 2 3 undef 4 3 DB<4>
Oh.

The behaviour has gone back to the previous one. I believe it's because split is splitting on the un-captured 'B', and since it's not captured, undef is the correct result, rather than a null string (with apologies to japhy.

Thoughts?

Alex / talexb / Toronto

Thanks PJ. We owe you so much. Groklaw -- RIP -- 2003 to 2013.

EDIT: Updated title to add '/ using a capture during split' to the title, as per feedback.

Replies are listed 'Best First'.
Re: Re-reading history from 2001
by vr (Curate) on Mar 05, 2019 at 16:56 UTC
    split is splitting on the un-captured 'B', and since it's not captured...

    No, it's the (A) that's not captured. Every group produces a field, as last paragraph in split explains.

    If the PATTERN contains capturing groups, then for each separator, an additional field is produced for each substring captured by a group (in the order in which the groups are specified, as per backreferences); if any group does not match, then it captures the undef value instead of a substring
    DB<3> @foo = split /(A)|(B)/, "1A2B3" DB<4> x @foo 0 1 1 'A' 2 undef 3 2 4 undef 5 'B' 6 3

    I'm reading linked node as japhy apologizes for what he did (i.e. making "split" to return empty string), and acknowledges that returning "undef" was correct from the very begining.

      Yes, and to get rid of the undefs, one can use the branch reset pattern:
      my @foo = split /(?|(A)|(B))/, "1A2B3"; use Data::Dumper; print Dumper \@foo;
      $VAR1 = [ '1', 'A', '2', 'B', '3' ];
      map{substr$_->[0],$_->[1]||0,1}[\*||{},3],[[]],[ref qr-1,-,-1],[{}],[sub{}^*ARGV,3]