Beefy Boxes and Bandwidth Generously Provided by pair Networks
Do you know where your variables are?

$1 doesn't reset?

by alfie (Pilgrim)
on Mar 20, 2001 at 15:04 UTC ( #65672=perlquestion: print w/replies, xml ) Need Help??

alfie has asked for the wisdom of the Perl Monks concerning the following question:


I have a quite annoying problem here for which I found a workaround already but would rather like to see why this is happening, so I'm asking here :-)

I have a loop around a match, which looks like the following:

use LWP::UserAgent; # request in here for ($res->content =~ m/foo(.*?)bar/g) { print "$_ - $1\n"; }
The problem is that $1 is always the first match. When I use $_ in this loop I get the thing I want - each new match in each iteration of the loop. I also tried it with a while loop but that one seems to go forever and also just displays the first match also like within the for loop with $1...

/me is totally confused.

Replies are listed 'Best First'.
Re: $1 doesn't reset?
by Corion (Patriarch) on Mar 20, 2001 at 15:28 UTC

    perlre tells us to use the \G anchor :

    while ($res->content =~ m/\G.*?foo(.*?)bar/g) { print "$_ - $1\n"; };

    Maybe you'll want to read Death to Dot Star! before you use .*?, but unless we know more about the data, .*? will have to do ;-)

    Update: If I run this program :

    #!/usr/bin/perl -w use strict; my $string = "fooAbar" . "fooBbar" . "gooCbar" . "fooDfooEbar" . "foo"; while ($string =~ m/\G.*?foo(.*?)bar/g) { print $1, "\n"; };

    I get the following, expected output :

    H:\>perl -w A B DfooE

    I guess the error lies somewhere else in your regular expression then - maybe post the whole RE together with some sample data.

      Yes, thanks for the \G info - but somehow it still works if I use $_ within the for loop.

      Also, this now doesn't match anything at all, strangely. Not within while or within a for loop. Only thing I got working is like said within for with $_ as the match.

      Btw., thanks for the link to Death to Dot Star, the .*? is save in this place.

      Sorry that I didn't mention it before, maybe it's important:
      This is perl, v5.6.0 built for i386-linux

        alfie, my understanding is this (and I'm relying on more knowledgable monks to correct me if I'm misunderstanding):

        Working with $_, not $1, is the correct behaviour in the for loop. Because:
        - the regexp match produces a list
        - the for(each) loop cycles through each element of the list, assigning them to $_ as it goes.

        So... $1 doesn't contain each match in turn, because the regexp is only executed once, and the matches are put into $1 to $n, and into a list, and the loop goes round the list, aliasing $_ to each element in turn.

        If you want to use $1, then Corion's suggestion of \G should do the biz. But it looks to me as though the for loop and $_ would be more efficient.


Re: $1 doesn't reset?
by bjelli (Pilgrim) on Mar 20, 2001 at 17:43 UTC

    The problem is that the "for" loop does all the matching beforehand. the stuff inside the bracket is evaled in an array context before the loop starts. You could also write it like this:

    @allmatches = $text =~ m/foo(.*?)bar/g; for ( @allmatches ) { print "dollarunderscore= $_ \tdollarone= $1\n"; }

    $_ is set to the stuff you matches, $1 is always the last match - just as you described in your post.

    This is quite different from the while loop. Here the expression in brackets is evalutated in a scalar context, which means the matching is done one at a time:

    while ($text =~ m/foo(.*?)bar/g) { print "dollarunderscore= $_ \tdollarone= $1\n"; }

    You could also write this as

    $text =~ m/foo(.*?)bar/g; print "$_ \t$1\n"; $text =~ m/foo(.*?)bar/g; print "$_ \t$1\n"; $text =~ m/foo(.*?)bar/g; print "$_ \t$1\n"; #[... repeated as often as necessary ...]

    Here $1 is set to the (one thing) you just matches with <kbd>(.*?)</kbd>, $_ is not set at all.

    Brigitte    'I never met a chocolate I didnt like'    Jellinek
$1 doesn't reset?
by frankus (Priest) on Mar 20, 2001 at 15:41 UTC

    I think the problem is that the array you expect to get back from the regular expression is broken. I slightly altered your code.

    thus: (I think it does the same thing.)

    $_='axxbcaxbagaxbacba'; for (m/a(.*?)b/g) { print "$a\n"; }
    It don't, I went right back to basics:
    $_='axxbcaxbagaxbacba'; foreach my $a (m/a(.*?)b/g) { print "$a\n"; }
    In the second example the return of each item from an array created by the regular expression is explicit and it works.

    P.S. That is a non greedy .* isn't it?

    Brother Frankus.
      Thanks for this workaround, too. But it still doesn't explain why $1 isn't filled in each iteration with the match but only in the first....

        I failed to make my point: I think it is to do with the array is being forced to a scalar, i.e. look at the loop, not the regex :)

        Yoda: "work-arounds, crufts, kludges...the dark side are they. Easily +they flow, quick to join you in times of trouble. Luke: "Is the dark side more powerful?" Yoda: ", quicker, easier, more seductive" Luke: "How am I to know the good side from bad?" Yoda: "Once you start along the dark path, forever will it cloud your +change requests"

        Brother Frankus.

        Edit 2001-03-20 by tye (changed <pre> to <code>)

        I'll take a crack at this one, since it gave me the same kind of problems when I started using regexps as loop controls. Using brother frankus' example :
        $_='axxbcaxbagaxbacba'; foreach my $a (m/a(.*?)b/g) { print "$a\n"; }
        OK, so, from perlman :

        The foreach modifier is an iterator: For each value in EXPR, it aliases $_ to the value and executes the statement
        So, right away, we see that foreach doesn't care about, or understand $1 and company at all. It only knows about its control variable ($a, in this case) and its list : (m/a(.*?)b/g).
        So what's the for statement's list? It's generated from m/a(.*?)b/g. And, as perlop states,

        (m//) ... in a list context returns a list consisting of the subexpressions matched by the parentheses in the pattern, i.e., ($1, $2, $3...)
        So, with this regexp, you get an anonymous list of 4 elements. It's like writing
        foreach my $a ($1, $2, $3, $4) {
        by the time the code in the loop's running, the regexp has run and returned a list of values that foreach will process.
        I hope that makes things a little clearer, this gave me trouble for a while, and hopefully this explanation will minimize the trouble it gives you.
        update see below... sometimes convenient variables aren't good for practical use.

Log In?

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://65672]
Approved by root
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others scrutinizing the Monastery: (5)
As of 2022-05-17 11:21 GMT
Find Nodes?
    Voting Booth?
    Do you prefer to work remotely?

    Results (65 votes). Check out past polls.