Beefy Boxes and Bandwidth Generously Provided by pair Networks
Your skill will accomplish
what the force of many cannot
 
PerlMonks  

Extraneous behaviour of match variables

by explorer (Chaplain)
on Nov 03, 2006 at 02:37 UTC ( #581996=perlquestion: print w/ replies, xml ) Need Help??
explorer has asked for the wisdom of the Perl Monks concerning the following question:

I have this problem: to capture two strings into an URL, but if the first string have the 'video' string, I need that $1 and $2 return '' (o undefined).

foreach my $url ( '<a href="/story/43480/">The Bottled Water Lie</a>', '<a href="/story/video/43480/">The Bottled Water Lie</a>', ) { print " $url\n"; if ( $url =~ m{href="(.+)">(.+)</a>} ) { print "O1: $1\n"; print "O2: $2\n"; if ( $1 !~ /video/ ) { print "Y1: $1\n"; print "Y2: $2\n"; } else { print "N1: $1\n"; print "N2: $2\n"; } print "F1: $1\n"; print "F2: $2\n"; } print "L1: $1\n"; print "L2: $2\n"; }
Output:
<a href="/story/43480/">The Bottled Water Lie</a> O1: /story/43480/ O2: The Bottled Water Lie Y1: /story/43480/ Y2: The Bottled Water Lie F1: /story/43480/ F2: The Bottled Water Lie L1: /story/43480/ L2: The Bottled Water Lie <a href="/story/video/43480/">The Bottled Water Lie</a> O1: /story/video/43480/ O2: The Bottled Water Lie N1: N2: F1: F2: L1: /story/video/43480/ L2: The Bottled Water Lie
Ok. This work. The second $url don't show nothing... but... into the else part (N1 & N2 lines) shown that match variables are reset to undef. And the F1 & F2 lines show undefined vars also.

perlre say:

The numbered match variables ($1, $2, $3, etc.) and the related punctuation set ($+ , $& , $` , $' , and $^N ) are all dynamically scoped until the end of the enclosing block or until the next successful match, whichever comes first. (See "Compound Statements" in perlsyn.)

NOTE: failed matches in Perl do not reset the match variables, which makes it easier to write code that tests for a series of more specific cases and remembers the best match.

But the second 'if' reset the match variables in a failed test.

So... I need a very deep explanation of this mystery, please.

Anyway, I reduced the problem to

$url =~ m{href="(.+)">(.+)</a>} and $1 !~ /video/; print "$1 $2 \n";
but I don't know how it is working, also :-(

Comment on Extraneous behaviour of match variables
Select or Download Code
Replies are listed 'Best First'.
Re: Extraneous behaviour of match variables
by ikegami (Pope) on Nov 03, 2006 at 03:49 UTC

    But the second 'if' reset the match variables in a failed test.

    Are you talking about the blank output for "N"? "N" is reached on a successful match. It's printing blank because $1 was cleared because /video/ has no captures.
    if ( $1 !~ /video/ )
    simply means
    if ( !( $1 =~ /video/ ) )
    The negation occurs *after* the match fails or succeeds.

    Update: Maybe the following will make things a little clearer:

    for (qw( video book )) { 'unchanged' =~ /(.*)/; # Set $1 if ( $_ !~ /(video)/ ) { print "true -> $1\n"; } else { print "false -> $1\n"; } }
    false -> video true -> unchanged

    When the match succeeds and the expression returns false, $1 is set.
    When the match fails and the expression returns true, $1 remains unchanged.

    Update: And finally, a solution to your problem

    foreach my $link ( '<a href="/story/43480/">The Bottled Water Lie</a>', '<a href="/story/video/43480/">The Bottled Water Lie</a>', ) { my ($url, $title) = $link =~ m{href="(.+)">(.+)</a>} or next; $url =~ /video/ and next; print("$url: $title\n"); }

    or

    foreach my $link ( '<a href="/story/43480/">The Bottled Water Lie</a>', '<a href="/story/video/43480/">The Bottled Water Lie</a>', ) { my ($url, $title) = $link =~ m{href="((?:(?!video).)+)">(.+)</a>} or next; print("$url: $title\n"); }

    Replace the print with whatever you want.

      Thanks, ikegami, for illumination.

      A question more. Is better ((?:(?!video).)+) that ((?:(?<!video).)+) ?

        Better might be to use an HTML parser (or something like HTML::LinkExtor) and simplify what you have to look at.

        (?:(?<!video).)+ is wrong.

        for my $re ( qr/"(?:(?!video).)+"/, qr/"(?:(?<!video).)+"/, qr/"(?:.(?<!video))+"/, ) { print("$re\n"); for ( '"...video..."', '"...video"', '"video..."', '"video"', ) { print("$_: ", /$re/?1:0, "\n"); } }
        (?-xism:"(?:(?!video).)+") "...video...": 0 "...video": 0 "video...": 0 "video": 0 (?-xism:"(?:(?<!video).)+") "...video...": 0 "...video": 1 <----- XXX "video...": 0 "video": 1 <----- XXX (?-xism:"(?:.(?<!video))+") "...video...": 0 "...video": 0 "video...": 0 "video": 0

        (?:(?!video).)+ and (?:.(?<!video))+ should be equivalent. You can do benchmarks to be sure.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://581996]
Approved by GrandFather
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others musing on the Monastery: (12)
As of 2015-07-31 16:11 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    The top three priorities of my open tasks are (in descending order of likelihood to be worked on) ...









    Results (279 votes), past polls