Beefy Boxes and Bandwidth Generously Provided by pair Networks
Problems? Is your data what you think it is?

Why do I get regexp chars in split?

by cormanaz (Chaplain)
on Oct 11, 2006 at 16:28 UTC ( #577644=perlquestion: print w/replies, xml ) Need Help??
cormanaz has asked for the wisdom of the Perl Monks concerning the following question:

Good day, monastic ones. When I run the following code
my $string = "foo / bar & etc"; my @parts = split(/ (\&|\/) /,$string); print join("\n",@parts);
the output is


I don't understand why I'm getting the slash and ampersand. I thought whatever is specified as the split string is considered delimiter and is not supposed to be included in the resulting list. So for example if I run

my $string = "foo bar etc"; my @parts = split(/\t/,$string); print join("\n",@parts);
(those are tabs between the words; don't know if they come thru as such on this post) then the resulting output is


with no tab chars output. What accounts for the difference between the two cases?

Many TIA....


Replies are listed 'Best First'.
Re: Why do I get regexp chars in split?
by duff (Parson) on Oct 11, 2006 at 16:37 UTC

    You've used capturing parens in your pattern. This causes split to include the captured bits in the return list. if you don't want them but do want a grouping, use (?: ... ) instead. See the split docs.

Re: Why do I get regexp chars in split?
by idsfa (Vicar) on Oct 11, 2006 at 16:40 UTC

    You don't want (capturing) parentheses, you want square brackets (a character class, see perlre):

    my $string = "foo / bar & etc"; my @parts = split(m# [&/] #,$string); print join("\n",@parts,"");

    (Note the extra null string to tack a newline onto the "etc", and the use of a different matching delimiter to avoid the need to escape the forward slash -- the ampersand never needed to be escaped )

    The intelligent reader will judge for himself. Without examining the facts fully and fairly, there is no way of knowing whether vox populi is really vox dei, or merely vox asinorum. — Cyrus H. Gordon
Re: Why do I get regexp chars in split?
by cdarke (Prior) on Oct 11, 2006 at 16:40 UTC
    ... and it might be better to use [] notation for alternate single chars:
    my @parts = split(/ [&\/] /,$string);
Re: Why do I get regexp chars in split?
by davido (Archbishop) on Oct 12, 2006 at 02:19 UTC

    Capturing parenthesis are your problem, and this is documented in split. One solution, as mentioned previously, is to use a character set ([ square brackets ]) instead of parenthesis. Using a character set would eliminate the need for alternation, and hense, eliminate the need for constraining parenthesis.

    That's not the only solution, however. If your alternation is over entities that span more than a single character, a character set wouldn't help. In that case you would still need something to constrain the alternation. If you wish for grouping without capturing, use (?: ..... ) instead of ( ..... ). This regular expression syntax is documented in perlre and perlretut. Basically it does everything parenthesis do, minus the capturing.


Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://577644]
Approved by Corion
and all is quiet...

How do I use this? | Other CB clients
Other Users?
Others chilling in the Monastery: (7)
As of 2018-06-23 14:13 GMT
Find Nodes?
    Voting Booth?
    Should cpanminus be part of the standard Perl release?

    Results (125 votes). Check out past polls.