alfie has asked for the wisdom of the Perl Monks concerning the following question:
Hi!
I have a quite annoying problem here for which I found a
workaround already but would rather like to see why this is
happening, so I'm asking here :-)
I have a loop around a match, which looks like the
following:
use LWP::UserAgent;
# request in here
for ($res->content =~ m/foo(.*?)bar/g) {
print "$_ - $1\n";
}
The problem is that $1 is always the first match. When I use
$_ in this loop I get the thing I want - each new match in
each iteration of the loop. I also tried it with a while
loop but that one seems to go forever and also just displays
the first match also like within the for loop with $1...
/me is totally confused.
--
Alfie
Re: $1 doesn't reset?
by Corion (Patriarch) on Mar 20, 2001 at 15:28 UTC
|
while ($res->content =~ m/\G.*?foo(.*?)bar/g) {
print "$_ - $1\n";
};
Maybe you'll want to read Death to Dot Star! before you use
.*?, but unless we know more about the data, .*? will have to do ;-)
Update: If I run this program :
#!/usr/bin/perl -w
use strict;
my $string = "fooAbar" .
"fooBbar" .
"gooCbar" .
"fooDfooEbar" .
"foo";
while ($string =~ m/\G.*?foo(.*?)bar/g) {
print $1, "\n";
};
I get the following, expected output :
H:\>perl -w test.pl
A
B
DfooE
I guess the error lies somewhere else in your regular expression then - maybe post the whole RE together with some sample data. | [reply] [d/l] [select] |
|
Yes, thanks for the \G info - but somehow it still works if
I use $_ within the for loop.
Also, this now doesn't match anything at all, strangely. Not
within while or within a for loop. Only thing I got working
is like said within for with $_ as the match.
Btw., thanks for the link to Death to Dot Star, the .*? is
save in this place.
Sorry that I didn't mention it before, maybe it's important:
This is perl, v5.6.0 built for i386-linux
--
Alfie
| [reply] |
|
alfie, my understanding is this (and I'm relying on more knowledgable monks to correct me if I'm misunderstanding):
Working with $_, not $1, is the correct behaviour in the for loop. Because:
- the regexp match produces a list
- the for(each) loop cycles through each element of the list, assigning them to $_ as it goes.
So... $1 doesn't contain each match in turn, because the regexp is only executed once, and the matches are put into $1 to $n, and into a list, and the loop goes round the list, aliasing $_ to each element in turn.
If you want to use $1, then Corion's suggestion of \G should do the biz. But it looks to me as though the for loop and $_ would be more efficient.
andy.
| [reply] |
Re: $1 doesn't reset?
by bjelli (Pilgrim) on Mar 20, 2001 at 17:43 UTC
|
The problem is that the "for" loop does all
the matching beforehand. the stuff inside the
bracket is evaled in an array context before
the loop starts. You could also write it like this:
@allmatches = $text =~ m/foo(.*?)bar/g;
for ( @allmatches ) {
print "dollarunderscore= $_ \tdollarone= $1\n";
}
$_ is set to the stuff you matches, $1 is always the
last match - just as you described in your post.
This is quite different from the while loop.
Here the expression in brackets is evalutated in
a scalar context, which means
the matching is done one at a time:
while ($text =~ m/foo(.*?)bar/g) {
print "dollarunderscore= $_ \tdollarone= $1\n";
}
You could also write this as
$text =~ m/foo(.*?)bar/g; print "$_ \t$1\n";
$text =~ m/foo(.*?)bar/g; print "$_ \t$1\n";
$text =~ m/foo(.*?)bar/g; print "$_ \t$1\n";
#[... repeated as often as necessary ...]
</code>
Here $1 is set to the (one thing) you just
matches with <kbd>(.*?)</kbd>, $_ is not set at all.
--
Brigitte 'I never met a chocolate I didnt like' Jellinek
http://www.horus.com/~bjelli/ http://perlwelt.horus.at | [reply] [d/l] [select] |
$1 doesn't reset?
by frankus (Priest) on Mar 20, 2001 at 15:41 UTC
|
I think the problem is that the array you expect to get back from the regular expression is broken. I slightly altered your code.
thus: (I think it does the same thing.)
$_='axxbcaxbagaxbacba';
for (m/a(.*?)b/g) {
print "$a\n";
}
It don't work...so, I went right back to basics:
$_='axxbcaxbagaxbacba';
foreach my $a (m/a(.*?)b/g) {
print "$a\n";
}
In the second example the return of each item from an array created by the regular expression is explicit and it works.
P.S. That is a non greedy .* isn't it?
--
Brother Frankus. | [reply] [d/l] [select] |
|
Thanks for this workaround, too. But it still doesn't
explain why $1 isn't filled in each iteration with the match
but only in the first....
--
Alfie
| [reply] |
|
Yoda: "work-arounds, crufts, kludges...the dark side are they. Easily
+they flow, quick to join you in times of trouble.
Luke: "Is the dark side more powerful?"
Yoda: "No..no....no, quicker, easier, more seductive"
Luke: "How am I to know the good side from bad?"
Yoda: "Once you start along the dark path, forever will it cloud your
+change requests"
--
Brother Frankus.
Edit 2001-03-20 by tye (changed <pre> to <code>) | [reply] [d/l] [select] |
|
I'll take a crack at this one, since it gave me the same kind of problems when I started using regexps as loop controls. Using brother frankus' example :
$_='axxbcaxbagaxbacba';
foreach my $a (m/a(.*?)b/g) {
print "$a\n";
}
OK, so, from perlman :
The foreach modifier is an iterator: For each value in EXPR, it aliases $_ to the value and executes the statement
So, right away, we see that foreach doesn't care about, or understand $1 and company at all. It only knows about its control variable ($a, in this case) and its list : (m/a(.*?)b/g).
So what's the for statement's list? It's generated from m/a(.*?)b/g. And, as perlop states,
(m//) ... in a list context returns a list consisting of the subexpressions matched by the parentheses in the pattern, i.e., ($1, $2, $3...)
So, with this regexp, you get an anonymous list of 4 elements. It's like writing foreach my $a ($1, $2, $3, $4) {
by the time the code in the loop's running, the regexp has run and returned a list of values that foreach will process.
I hope that makes things a little clearer, this gave me trouble for a while, and hopefully this explanation will minimize the trouble it gives you.
update see below... sometimes convenient variables aren't good for practical use.
| [reply] [d/l] [select] |
|
|
|