Beefy Boxes and Bandwidth Generously Provided by pair Networks
go ahead... be a heretic
 
PerlMonks  

Empty strings after split /(\W)/

by wollmers (Scribe)
on Oct 01, 2014 at 20:29 UTC ( [id://1102585]=perlquestion: print w/replies, xml ) Need Help??

wollmers has asked for the wisdom of the Perl Monks concerning the following question:

Is there an explanation, why split() behaves like this:

# a simple tokenizer $ perl -e 'print join("|",split(/(\W)/,"Hello, World! ?")),"\n";' Hello|,|| |World|!|| ||? # get rid of the empty string tokens $ perl -e 'print join("|",grep {$_}split(/(\W)/,"Hello, World! ?")),"\ +n";' Hello|,| |World|!| |? # or use plain regex $ perl -e 'print join("|","Hello, World! ?"=~/(\W|\w+)/g),"\n";' Hello|,| |World|!| |?

TIA

Helmut Wollmersdorfer

Replies are listed 'Best First'.
Re: Empty strings after split /(\W)/
by Anonymous Monk on Oct 01, 2014 at 20:44 UTC

    split does that when you put a capture group in the split expression.

    $ perl -le 'print join "|", split /(\W)/, "Hello, World! ?";' Hello|,|| |World|!|| ||? $ perl -le 'print join "|", split /\W/, "Hello, World! ?";' Hello||World
Re: Empty strings after split /(\W)/
by Anonymous Monk on Oct 01, 2014 at 21:09 UTC

    Think of it like a really simple CSV file*. Your separator character is \W. To make it easier to think about, replace all \W with a comma, and your input string is "Hello,,World,,,".

    split /,/, "Hello,,World,,," gives you the list "Hello", "", "World" (trailing empty fields are stripped as documented).

    What you're asking split to do when you say split /(,)/, "Hello,,World,,," is keep the separator character in the list of return values (also the empty fields between separators aren't stripped). Hence:

    $ perl -e 'print join("|",split(/(\W)/,"Hello,,World,,,")),"\n";' Hello|,||,|World|,||,||,

    * Just for the sake of discussion, we all know we should be using Text::CSV instead of split ;-)

Re: Empty strings after split /(\W)/ (as documents as split cuts)
by Anonymous Monk on Oct 01, 2014 at 20:52 UTC

    Is there an explanation, why split() behaves like this:

    Yes, because its documented that split behaves that way, split splits strings apart into pieces (split cuts), even if there is nothing in between

    Read perldoc -f split and consider this

    use Data::Dump qw/ dd /; dd( split /\D/, q/12Q34/ ); dd( split /\D/, q/12ab34/ ); dd( split /(\D)/, q/12ab34/ ); __END__ (12, 34) (12, "", 34) (12, "a", "", "b", 34)

    Q is not a digit between 12 and 34
    empty string "" is not a digit between a and b
    empty string "" is not a digit between a and b (a and b are preserved not discarded

    split cuts a string apart, discarding the cut pieces unless you (keep) them

Re: Empty strings after split /(\W)/
by Anonymous Monk on Oct 01, 2014 at 20:53 UTC

    What is your expected output? Is it what you're showing in your second and third example?

    Regarding grep {$_} ...: that will also filter out the string "0", since that's false in Perl. You may find grep {length} ... better.

    However, personally, I like your third example best, because I feel like it expresses best what you want the output to be, but of course There's More Than One Way To Do It :-)

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://1102585]
Approved by farang
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others avoiding work at the Monastery: (10)
As of 2024-04-23 08:22 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found