Beefy Boxes and Bandwidth Generously Provided by pair Networks
We don't bite newbies here... much
 
PerlMonks  

Extraneous behaviour of match variables

by explorer (Chaplain)
on Nov 03, 2006 at 02:37 UTC ( #581996=perlquestion: print w/ replies, xml ) Need Help??
explorer has asked for the wisdom of the Perl Monks concerning the following question:

I have this problem: to capture two strings into an URL, but if the first string have the 'video' string, I need that $1 and $2 return '' (o undefined).

foreach my $url ( '<a href="/story/43480/">The Bottled Water Lie</a>', '<a href="/story/video/43480/">The Bottled Water Lie</a>', ) { print " $url\n"; if ( $url =~ m{href="(.+)">(.+)</a>} ) { print "O1: $1\n"; print "O2: $2\n"; if ( $1 !~ /video/ ) { print "Y1: $1\n"; print "Y2: $2\n"; } else { print "N1: $1\n"; print "N2: $2\n"; } print "F1: $1\n"; print "F2: $2\n"; } print "L1: $1\n"; print "L2: $2\n"; }
Output:
<a href="/story/43480/">The Bottled Water Lie</a> O1: /story/43480/ O2: The Bottled Water Lie Y1: /story/43480/ Y2: The Bottled Water Lie F1: /story/43480/ F2: The Bottled Water Lie L1: /story/43480/ L2: The Bottled Water Lie <a href="/story/video/43480/">The Bottled Water Lie</a> O1: /story/video/43480/ O2: The Bottled Water Lie N1: N2: F1: F2: L1: /story/video/43480/ L2: The Bottled Water Lie
Ok. This work. The second $url don't show nothing... but... into the else part (N1 & N2 lines) shown that match variables are reset to undef. And the F1 & F2 lines show undefined vars also.

perlre say:

The numbered match variables ($1, $2, $3, etc.) and the related punctuation set ($+ , $& , $` , $' , and $^N ) are all dynamically scoped until the end of the enclosing block or until the next successful match, whichever comes first. (See "Compound Statements" in perlsyn.)

NOTE: failed matches in Perl do not reset the match variables, which makes it easier to write code that tests for a series of more specific cases and remembers the best match.

But the second 'if' reset the match variables in a failed test.

So... I need a very deep explanation of this mystery, please.

Anyway, I reduced the problem to

$url =~ m{href="(.+)">(.+)</a>} and $1 !~ /video/; print "$1 $2 \n";
but I don't know how it is working, also :-(

Comment on Extraneous behaviour of match variables
Select or Download Code
Re: Extraneous behaviour of match variables
by ikegami (Pope) on Nov 03, 2006 at 03:49 UTC

    But the second 'if' reset the match variables in a failed test.

    Are you talking about the blank output for "N"? "N" is reached on a successful match. It's printing blank because $1 was cleared because /video/ has no captures.
    if ( $1 !~ /video/ )
    simply means
    if ( !( $1 =~ /video/ ) )
    The negation occurs *after* the match fails or succeeds.

    Update: Maybe the following will make things a little clearer:

    for (qw( video book )) { 'unchanged' =~ /(.*)/; # Set $1 if ( $_ !~ /(video)/ ) { print "true -> $1\n"; } else { print "false -> $1\n"; } }
    false -> video true -> unchanged

    When the match succeeds and the expression returns false, $1 is set.
    When the match fails and the expression returns true, $1 remains unchanged.

    Update: And finally, a solution to your problem

    foreach my $link ( '<a href="/story/43480/">The Bottled Water Lie</a>', '<a href="/story/video/43480/">The Bottled Water Lie</a>', ) { my ($url, $title) = $link =~ m{href="(.+)">(.+)</a>} or next; $url =~ /video/ and next; print("$url: $title\n"); }

    or

    foreach my $link ( '<a href="/story/43480/">The Bottled Water Lie</a>', '<a href="/story/video/43480/">The Bottled Water Lie</a>', ) { my ($url, $title) = $link =~ m{href="((?:(?!video).)+)">(.+)</a>} or next; print("$url: $title\n"); }

    Replace the print with whatever you want.

      Thanks, ikegami, for illumination.

      A question more. Is better ((?:(?!video).)+) that ((?:(?<!video).)+) ?

        Better might be to use an HTML parser (or something like HTML::LinkExtor) and simplify what you have to look at.

        (?:(?<!video).)+ is wrong.

        for my $re ( qr/"(?:(?!video).)+"/, qr/"(?:(?<!video).)+"/, qr/"(?:.(?<!video))+"/, ) { print("$re\n"); for ( '"...video..."', '"...video"', '"video..."', '"video"', ) { print("$_: ", /$re/?1:0, "\n"); } }
        (?-xism:"(?:(?!video).)+") "...video...": 0 "...video": 0 "video...": 0 "video": 0 (?-xism:"(?:(?<!video).)+") "...video...": 0 "...video": 1 <----- XXX "video...": 0 "video": 1 <----- XXX (?-xism:"(?:.(?<!video))+") "...video...": 0 "...video": 0 "video...": 0 "video": 0

        (?:(?!video).)+ and (?:.(?<!video))+ should be equivalent. You can do benchmarks to be sure.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://581996]
Approved by GrandFather
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others drinking their drinks and smoking their pipes about the Monastery: (16)
As of 2014-09-22 20:44 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    How do you remember the number of days in each month?











    Results (200 votes), past polls