Perl Idioms Explained - @ary = $str =~ m/(stuff)/g

Consider that you want a regex to find lots of things in a string and store them into an array. Perl has a very convenient idiom for this:

@ary = $str =~ m/(stuff)/g;
[download]

If you are after a single match you use a scalar in list context as the L-VALUE ie

($scalar) = $str =~ m/(this)/;
[download]

Note if you forget the ( ) around $scalar and you get a match $scalar will contain the integer value 1 so don't forget the ( ). The ( ) gets you list context which you need.

Anyway, although this might not make a lot of sense at first glance it is really very simple. If we had just this:

$str =~ m/(stuff)/;
print $1
[download]

then we would expect our code to print 'stuff' if the string contained the literal 'stuff' as this gets captured into $1. The addition of /g means the $1 will sequentailly contain 'stuff' EVERY time $str =~m/(stuff)g is true ie 0..n times. Now if we know that a regex is a valid R value in an expression, we know we can write L-VALUE = R-VALUE so we can understand that:

@all_the_matches   = $str =~ m/(stuff)/g;
[download]

In array context we get all the matches into our array. So for example you can do:

@links = $html =~ m/<a[^>]+href\s*=\s*["']?([^"'> ]+)/ig;
[download]

This is a reasonably reliable and quick way to extract all the <a...href=...> links from HTML. Although you can certainly use HTML::LinkExtor or any of the other HTML::Parser based widgets there are times when you want to say extract all the links that look like:

<A CLASS="blah" HREF="foo.com">
[download]

A carefully chosen regex can extract exactly what you want, without any excess, as you can make it match a specific link subset with ease. Using this idiom you can grok the matches into an array in one elegant line of Perl.....

The uses are of course only limited by your imagination. Using ^ and /m you can do things like extract a specific field from a space separated data set:

$data = '
f1 f2 f3
f4 f5 f6
f7 f8 f9
';

@second = $data =~ m/^\S+\s+(\S+)/mg;
print "@second";
[download]

As always YMMV and you should pick the best hammer to drive the nail at hand.

Update

Technical inaccurary removed. For the details on what happens if you put a scalar on the LHS of this idiom see this where bart gets to poke fun at me for making an untested assumption and my pitiful excuses here and a round about hack....

Comment on Perl Idioms Explained - @ary = $str =~ m/(stuff)/g Select or Download Code

Replies are listed 'Best First'.
Re: Perl Idioms Explained - @ary = $str =~ m/(stuff)/g by CountZero (Bishop) on Sep 15, 2003 at 13:43 UTC
One should perhaps add, that the real reason why this magic works is because "`=~`" binds more thightly than "`=`", so the regex gets done first and then its results are evaluated in list-context (which is provided by using an array as an L-value). CountZero "If you have four groups working on a compiler, you'll get a 4-pass compiler." - Conway's Law	[reply] [d/l] [select]
Re: Re: Perl Idioms Explained - @ary = $str =~ m/(stuff)/g by tachyon (Chancellor) on Sep 15, 2003 at 14:02 UTC
To be completely pedantic the reason it 'works' is that they coded it that way! The reason it works without needing parens is as you say. The precedence order is why you need to do stuff like this: `(my $fix_up = $old_str ) =~ s/this/that/g;` [download] But that is another idiom I guess. This one lets you set declare $fix_up, set it to $old_str, and then modify $fix_up in one line, while leaving $old_str intact for future reference..... cheers tachyon s&&rsenoyhcatreve&&&s&n.+t&"$'$`$\"$\&"&ee&&y&srve&&d&&print	[reply] [d/l]
Re: Perl Idioms Explained - @ary = $str =~ m/(stuff)/g by bart (Canon) on Sep 17, 2003 at 10:15 UTC
`$num_matches_stuff = $str =~ m/(stuff)/g; @all_the_matches = $str =~ m/(stuff)/g;` [download] In scalar context we get the count of the matches, in array context we get the matches. No, the former just is not right. You haven't actually tried it, have you? `$str = "stuff stuff stuff"; $num_matches_stuff = $str =~ m/(stuff)/g; print $num_matches_stuff;` [download] Result: 1 The /g modifier in scalar context is very special. It is intended to be used in a loop, something like this: `$str = "stuff stuff stuff"; while($str =~ m/(stuff)/g) { print "Got one!\n"; }` [download] Result: Got one! Got one! Got one! So in scalar context, it will match at most once at a time — next time around, it'll continue where it left off last time. Therefore, the returned valued of //g when used in scalar context is either 0, or 1.	[reply] [d/l] [select]
Re: Re: Perl Idioms Explained - @ary = $str =~ m/(stuff)/g by tachyon (Chancellor) on Sep 17, 2003 at 13:15 UTC
Hmm it is of course as you say. I must have got myself confused with the behaviour of s///. `$data = 'stuff stuff stuff'; $num = $data =~ s/(stuff)//g; print $num, $/; __DATA__ 3` [download] You can force it to do as I said by cheating thusly.... `$num = () = 'stuff stuff stuff' =~ m/(stuff)/g; print $num; __DATA__ 3` [download] cheers tachyon s&&rsenoyhcatreve&&&s&n.+t&"$'$`$\"$\&"&ee&&y&srve&&d&&print	[reply] [d/l] [select]
Re: Perl Idioms Explained - @ary = $str =~ m/(stuff)/g by Roger (Parson) on Sep 16, 2003 at 00:17 UTC
Hi, I have a question on the capturing parentheses. I have made the following test sample to explore the necessity of the parentheses: `$str = "Ab stuff Cd stuff Ef stuff"; # case 1 @ary1 = $str =~ m/(stuff)/g ; # case 2 @ary2 = $str =~ m/(?:stuff)/g ; # case 3 @ary3 = $str =~ m/stuff/g ; print "\@ary1 = @ary1\n"; print "\@ary2 = @ary2\n"; print "\@ary3 = @ary3\n";` [download] All three cases return the same result. Explanation for case 1 is covered in earlier posts. However I am puzzled by the use of parentheses in the example, so I added ?: to it to tell the regular expression to forget the value in the capture parentheses if any. The result is the same! So the regular expression is not acting on the $1 variable captured by the parentheses at all. So I eliminated the parentheses totally, I still get the same result. Ok, my instinct tells me that this Perl idiom is acting on the behaviour of m//g, or more specific the g modifier. It seems the g modifier introduces it's own pattern matching memory behaviour and discards the regular expression memory in some cases. I looked up the perldoc, which states: The /g modifier specifies global pattern matching--that is, matching as many times as possible within the string. How it behaves depends on the context. In list context, it returns a list of the substrings matched by any capturing parentheses in the regular expression. If there are no parentheses, it returns a list of all the matched strings, as if there were parentheses around the whole pattern. Ok, my question is, what is the expected behaviour of the g modifier? Why is the /g modifier capturing the value that I want it to forget (with ?:)? Is it a feature or bug? Or perhaps /(?:pattern)/ is equivalent to /pattern/?	[reply] [d/l]
Re: Re: Perl Idioms Explained - @ary = $str =~ m/(stuff)/g by antirice (Priest) on Sep 16, 2003 at 02:08 UTC
I've always attributed this behavior to perl's DWIM approach to usability. The reason all three return the same thing is pretty simple: in the case where your regex doesn't capture anything, the actual instance that matches is returned instead. Also, if you have more than one capturing portion, it will push the extras onto the array as well. Try this: `$str = "Ab stuff Cd stuff Ef stuff"; # case 1 @ary1 = $str =~ m/(stuff)/g ; @ary2 = $str =~ m/(st)u(ff)/g ; print "\@ary1 = @ary1\n"; print "\@ary2 = @ary2\n"; __DATA__ outputs: @ary1 = stuff stuff stuff @ary2 = st ff st ff st ff` [download] Nifty, eh? I rather like this behavior. Also please note that the g only means return all instances where the pattern matches. If you remove the g from the regexes above, then only the first match is returned. Hope this helps. I just noticed this is my 200th post. Yay. antirice The first rule of Perl club is - use Perl The ith rule of Perl club is - follow rule i - 1 for i > 1	[reply] [d/l]
Re: Re: Perl Idioms Explained - @ary = $str =~ m/(stuff)/g by tachyon (Chancellor) on Sep 16, 2003 at 04:44 UTC
An interesting observation. The behaviour for (?:...) and naked match strings is however appropriate provided you have at least one capture (...) in the RE. `@ary = 'stuff stuff stuff' =~ m/(?:st)u(ff)/g; print "@ary"; __DATA__ ff ff ff` [download] Depending on your viewpoint your observation represents a bug or a feature! cheers tachyon s&&rsenoyhcatreve&&&s&n.+t&"$'$`$\"$\&"&ee&&y&srve&&d&&print	[reply] [d/l]
Re: Perl Idioms Explained - @ary = $str =~ m/(stuff)/g by shenme (Priest) on Sep 15, 2003 at 21:00 UTC
Is there an "... Explained" that covers the extension of this to: `$howmanytimes = @ary = $str =~ m/(stuff)/g; # or perhaps better as $howmanytimes = () = $str =~ m/stuff/g;` [download] which then goes to show you don't need the capturing parentheses in the match and .... I'll shut up.	[reply] [d/l]
Re: Perl Idioms Explained - @ary = $str =~ m/(stuff)/g by Anonymous Monk on Apr 07, 2015 at 15:42 UTC
So how do you check that the string contains only what you expect? For example you might have the following code `while ($str =~ s/\A(foo)//) { push @got, $1; } die "unexpected stuff in string: $str" if $str ne '';` [download] This takes a paranoid approach, checking that there is no junk at the beginning of the string before the series of 'foo' begins, and no junk at the end. It would be more efficient to just say `@got = $str =~ m/(foo)/g;` [download] but now you lose error checking. If $str contains 'xfooy' then the leading and trailing junk will be silently ignored. That's not great, since unmatched 'junk' more often than not indicates a bug in your regexp you need to fix. Is there a way to get the efficiency of m//g but still check that the whole string is matched? I suspect it will involve the \G anchor but I am not sure how.	[reply] [d/l] [select]
Re^2: Perl Idioms Explained - @ary = $str =~ m/(stuff)/g by ed (Initiate) on Apr 07, 2015 at 15:49 UTC
I am the author of the above comment. Something like this seems to work: `$result =~ /\A/g or die; # make extra sure we are at start of string @r = ($result =~ m/\G(foo)/gc); if ($result =~ /\G(.+)/sg) { warn "leftover junk at the end of result: $1"; }` [download] The key is the /c flag on the repeated match so that when it fails, it leaves the match position in place so the trailing junk, if any, can be reported. If there is leading junk, then the regexp won't match any times, and the whole string will be reported as 'leftover'. That is a good enough error message for my purposes.	[reply] [d/l]