Beefy Boxes and Bandwidth Generously Provided by pair Networks Bob
Clear questions and runnable code
get the best and fastest answer
 
PerlMonks  

Extraneous behaviour of match variables

by explorer (Chaplain)
on Nov 03, 2006 at 02:37 UTC ( #581996=perlquestion: print w/ replies, xml ) Need Help??
explorer has asked for the wisdom of the Perl Monks concerning the following question:

I have this problem: to capture two strings into an URL, but if the first string have the 'video' string, I need that $1 and $2 return '' (o undefined).

foreach my $url ( '<a href="/story/43480/">The Bottled Water Lie</a>', '<a href="/story/video/43480/">The Bottled Water Lie</a>', ) { print " $url\n"; if ( $url =~ m{href="(.+)">(.+)</a>} ) { print "O1: $1\n"; print "O2: $2\n"; if ( $1 !~ /video/ ) { print "Y1: $1\n"; print "Y2: $2\n"; } else { print "N1: $1\n"; print "N2: $2\n"; } print "F1: $1\n"; print "F2: $2\n"; } print "L1: $1\n"; print "L2: $2\n"; }
Output:
<a href="/story/43480/">The Bottled Water Lie</a> O1: /story/43480/ O2: The Bottled Water Lie Y1: /story/43480/ Y2: The Bottled Water Lie F1: /story/43480/ F2: The Bottled Water Lie L1: /story/43480/ L2: The Bottled Water Lie <a href="/story/video/43480/">The Bottled Water Lie</a> O1: /story/video/43480/ O2: The Bottled Water Lie N1: N2: F1: F2: L1: /story/video/43480/ L2: The Bottled Water Lie
Ok. This work. The second $url don't show nothing... but... into the else part (N1 & N2 lines) shown that match variables are reset to undef. And the F1 & F2 lines show undefined vars also.

perlre say:

The numbered match variables ($1, $2, $3, etc.) and the related punctuation set ($+ , $& , $` , $' , and $^N ) are all dynamically scoped until the end of the enclosing block or until the next successful match, whichever comes first. (See "Compound Statements" in perlsyn.)

NOTE: failed matches in Perl do not reset the match variables, which makes it easier to write code that tests for a series of more specific cases and remembers the best match.

But the second 'if' reset the match variables in a failed test.

So... I need a very deep explanation of this mystery, please.

Anyway, I reduced the problem to

$url =~ m{href="(.+)">(.+)</a>} and $1 !~ /video/; print "$1 $2 \n";
but I don't know how it is working, also :-(

Comment on Extraneous behaviour of match variables
Select or Download Code
Re: Extraneous behaviour of match variables
by ikegami (Pope) on Nov 03, 2006 at 03:49 UTC

    But the second 'if' reset the match variables in a failed test.

    Are you talking about the blank output for "N"? "N" is reached on a successful match. It's printing blank because $1 was cleared because /video/ has no captures.
    if ( $1 !~ /video/ )
    simply means
    if ( !( $1 =~ /video/ ) )
    The negation occurs *after* the match fails or succeeds.

    Update: Maybe the following will make things a little clearer:

    for (qw( video book )) { 'unchanged' =~ /(.*)/; # Set $1 if ( $_ !~ /(video)/ ) { print "true -> $1\n"; } else { print "false -> $1\n"; } }
    false -> video true -> unchanged

    When the match succeeds and the expression returns false, $1 is set.
    When the match fails and the expression returns true, $1 remains unchanged.

    Update: And finally, a solution to your problem

    foreach my $link ( '<a href="/story/43480/">The Bottled Water Lie</a>', '<a href="/story/video/43480/">The Bottled Water Lie</a>', ) { my ($url, $title) = $link =~ m{href="(.+)">(.+)</a>} or next; $url =~ /video/ and next; print("$url: $title\n"); }

    or

    foreach my $link ( '<a href="/story/43480/">The Bottled Water Lie</a>', '<a href="/story/video/43480/">The Bottled Water Lie</a>', ) { my ($url, $title) = $link =~ m{href="((?:(?!video).)+)">(.+)</a>} or next; print("$url: $title\n"); }

    Replace the print with whatever you want.

      Thanks, ikegami, for illumination.

      A question more. Is better ((?:(?!video).)+) that ((?:(?<!video).)+) ?

        Better might be to use an HTML parser (or something like HTML::LinkExtor) and simplify what you have to look at.

        (?:(?<!video).)+ is wrong.

        for my $re ( qr/"(?:(?!video).)+"/, qr/"(?:(?<!video).)+"/, qr/"(?:.(?<!video))+"/, ) { print("$re\n"); for ( '"...video..."', '"...video"', '"video..."', '"video"', ) { print("$_: ", /$re/?1:0, "\n"); } }
        (?-xism:"(?:(?!video).)+") "...video...": 0 "...video": 0 "video...": 0 "video": 0 (?-xism:"(?:(?<!video).)+") "...video...": 0 "...video": 1 <----- XXX "video...": 0 "video": 1 <----- XXX (?-xism:"(?:.(?<!video))+") "...video...": 0 "...video": 0 "video...": 0 "video": 0

        (?:(?!video).)+ and (?:.(?<!video))+ should be equivalent. You can do benchmarks to be sure.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://581996]
Approved by GrandFather
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others perusing the Monastery: (4)
As of 2014-04-20 03:52 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    April first is:







    Results (485 votes), past polls