Beefy Boxes and Bandwidth Generously Provided by pair Networks
laziness, impatience, and hubris
 
PerlMonks  

Curios use of regular expressions in split

by juliosergio (Acolyte)
on Feb 15, 2012 at 17:14 UTC ( #954004=perlquestion: print w/ replies, xml ) Need Help??
juliosergio has asked for the wisdom of the Perl Monks concerning the following question:

I'm trying to use what was matched by a regular expression utilized in the 'split' function, so I wrote something like:

@a2 = split /(abc)/, "uno abc dos"; ... some use of $1 ...

However this doesn't work, because, I guess, 'split' doesn't treat /(abc)/ as a regular expression. See the following example:

#! /usr/bin/perl @a1 = split /abc/, "uno abc dos"; @a2 = split /(abc)/, "uno abc dos"; print "@a1\n"; # prints "uno dos", as expeted print "@a2\n"; # prints "uno abc dos" !!! :(

Can you explain me why is it so?, and what can I do to recover the string matched by the regular expression?
Thanks,
-Sergio

Comment on Curios use of regular expressions in split
Select or Download Code
Re: Curios use of regular expressions in split
by Corion (Pope) on Feb 15, 2012 at 17:19 UTC

    See split:

    If the PATTERN contains parentheses, additional list elements are created from each matching substring in the delimiter.

    So, if you don't want the additional list elements, don't use capturing parentheses. Use noncapturing parentheses (?: ... ) instead.

      This doesn't solve my problem, because the matched string isn't stored in $1 ...

        Well, if you want to keep the stuff you split on, use capturing parentheses. If you don't want to keep the stuff, don't use them. $1 is never set, so the way you are trying it will never work.

        This doesn't solve my problem, because the matched string isn't stored in $1 ...

        You were saying?

        @a3 = split /(?:abc)/, "uno abc dos"; print "@a3\n"; __END__ uno dos
Re: Curios use of regular expressions in split
by Riales (Hermit) on Feb 15, 2012 at 17:39 UTC

    I don't know if anybody has a better solution, but you could just take two steps to both capture what you want to capture and split the string:

    my $string = "uno abc dos"; my @a = split /abc/, $string; $string =~ /(abc)/; ...some use of $1...

    If your pattern is just something as trivial as abc, you could just assume $1 = 'abc'...but I'm assuming it's not.

      my @matches = $input =~ /$re/g; my @splits = split /$re/, $input;

      Up to now, I think yours is the best solution, though, I was trying to do it in a single step..
      Thanks!,
      -Sergio

Re: Curios use of regular expressions in split ($1 eh?)
by tye (Cardinal) on Feb 15, 2012 at 17:47 UTC
    my @a = split /(abc)/, "uno abc dos"; print "\$1=($1)\n" if splice(@a,1,1) =~ /(.*)/; print "\@a=(", join(',',@a), ")\n"; __END__ $1=(abc) @a=(uno , dos)

    - tye        

Re: Curios use of regular expressions in split
by johngg (Abbot) on Feb 15, 2012 at 22:34 UTC

    As you can see, the spaces in the string are preserved when using 'abc' as the separator. This may or may not be the behaviour you want.

    You could actually target two arrays, one for words and one for separators, by combining the split with a push. If you don't actually want the trailing and leading spaces preserved, you could be to split on white space and direct to a separate arrays as before.

    knoppix@Microknoppix:~$ perl -E ' > $str = q{one abc two abc three abc four}; > $sep = q{abc}; > say q{-} x 25; > > @arr = split m{($sep)}, $str; > say qq{Split on m{($sep)} into one array}; > say qq{ ->$_<-} for @arr; > say q{-} x 25; > > push @{ $_ eq $sep ? \ @seps : \ @nums }, $_ > for split m{($sep)}, $str; > say qq{Split on m{($sep)} into two arrays}; > say q{ Nums:}; > say qq{ ->$_<-} for @nums; > say q{ Seps:}; > say qq{ ->$_<-} for @seps; > say q{-} x 25; > > @seps = (); @nums = (); > push @{ $_ eq $sep ? \ @seps : \ @nums }, $_ > for split m{\s+}, $str; > say qq{Split on m{\\s+} into two arrays}; > say q{ Nums:}; > say qq{ ->$_<-} for @nums; > say q{ Seps:}; > say qq{ ->$_<-} for @seps; > say q{-} x 25;' ------------------------- Split on m{(abc)} into one array ->one <- ->abc<- -> two <- ->abc<- -> three <- ->abc<- -> four<- ------------------------- Split on m{(abc)} into two arrays Nums: ->one <- -> two <- -> three <- -> four<- Seps: ->abc<- ->abc<- ->abc<- ------------------------- Split on m{\s+} into two arrays Nums: ->one<- ->two<- ->three<- ->four<- Seps: ->abc<- ->abc<- ->abc<- ------------------------- knoppix@Microknoppix:~$

    I hope this is of interest.

    Cheers,

    JohnGG

      Great! Finally I'm understanding what's behind the split function!

      When you enclose the expression with parenthesis, the string is splitted according to the pattern but the matched separators are also stored in the resulting array. That's really interesting, because you can easily manipulate in the same array both, the separated stuff and the separators. See my example below:

      #! /usr/bin/perl use Data::Dumper; $re = "(ab+)"; $input = "uno abb dos ab tres abbb cuatro"; my @splits = split /$re/, $input; print "splits: ", Dumper \@splits; __END__ splits: $VAR1 = [ 'uno ', 'abb', ' dos ', 'ab', ' tres ', 'abbb', ' cuatro' ];

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://954004]
Approved by toolic
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others having an uproarious good time at the Monastery: (6)
As of 2014-09-21 18:52 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    How do you remember the number of days in each month?











    Results (175 votes), past polls