Beefy Boxes and Bandwidth Generously Provided by pair Networks
Syntactic Confectionery Delight
 
PerlMonks  

Why do I get regexp chars in split?

by cormanaz (Chaplain)
on Oct 11, 2006 at 16:28 UTC ( #577644=perlquestion: print w/ replies, xml ) Need Help??
cormanaz has asked for the wisdom of the Perl Monks concerning the following question:

Good day, monastic ones. When I run the following code
my $string = "foo / bar & etc"; my @parts = split(/ (\&|\/) /,$string); print join("\n",@parts);
the output is

foo
/
bar
&
etc

I don't understand why I'm getting the slash and ampersand. I thought whatever is specified as the split string is considered delimiter and is not supposed to be included in the resulting list. So for example if I run

my $string = "foo bar etc"; my @parts = split(/\t/,$string); print join("\n",@parts);
(those are tabs between the words; don't know if they come thru as such on this post) then the resulting output is

foo
bar
etc

with no tab chars output. What accounts for the difference between the two cases?

Many TIA....

Steve

Comment on Why do I get regexp chars in split?
Select or Download Code
Re: Why do I get regexp chars in split?
by duff (Vicar) on Oct 11, 2006 at 16:37 UTC

    You've used capturing parens in your pattern. This causes split to include the captured bits in the return list. if you don't want them but do want a grouping, use (?: ... ) instead. See the split docs.

Re: Why do I get regexp chars in split?
by cdarke (Prior) on Oct 11, 2006 at 16:40 UTC
    ... and it might be better to use [] notation for alternate single chars:
    my @parts = split(/ [&\/] /,$string);
Re: Why do I get regexp chars in split?
by idsfa (Vicar) on Oct 11, 2006 at 16:40 UTC

    You don't want (capturing) parentheses, you want square brackets (a character class, see perlre):

    my $string = "foo / bar & etc"; my @parts = split(m# [&/] #,$string); print join("\n",@parts,"");

    (Note the extra null string to tack a newline onto the "etc", and the use of a different matching delimiter to avoid the need to escape the forward slash -- the ampersand never needed to be escaped )


    The intelligent reader will judge for himself. Without examining the facts fully and fairly, there is no way of knowing whether vox populi is really vox dei, or merely vox asinorum. — Cyrus H. Gordon
Re: Why do I get regexp chars in split?
by davido (Archbishop) on Oct 12, 2006 at 02:19 UTC

    Capturing parenthesis are your problem, and this is documented in split. One solution, as mentioned previously, is to use a character set ([ square brackets ]) instead of parenthesis. Using a character set would eliminate the need for alternation, and hense, eliminate the need for constraining parenthesis.

    That's not the only solution, however. If your alternation is over entities that span more than a single character, a character set wouldn't help. In that case you would still need something to constrain the alternation. If you wish for grouping without capturing, use (?: ..... ) instead of ( ..... ). This regular expression syntax is documented in perlre and perlretut. Basically it does everything parenthesis do, minus the capturing.


    Dave

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://577644]
Approved by Corion
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others rifling through the Monastery: (7)
As of 2014-12-28 15:16 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    Is guessing a good strategy for surviving in the IT business?





    Results (182 votes), past polls