Beefy Boxes and Bandwidth Generously Provided by pair Networks
Problems? Is your data what you think it is?
 
PerlMonks  

Re: Extraneous behaviour of match variables

by ikegami (Pope)
on Nov 03, 2006 at 03:49 UTC ( #582001=note: print w/ replies, xml ) Need Help??


in reply to Extraneous behaviour of match variables

But the second 'if' reset the match variables in a failed test.

Are you talking about the blank output for "N"? "N" is reached on a successful match. It's printing blank because $1 was cleared because /video/ has no captures.
if ( $1 !~ /video/ )
simply means
if ( !( $1 =~ /video/ ) )
The negation occurs *after* the match fails or succeeds.

Update: Maybe the following will make things a little clearer:

for (qw( video book )) { 'unchanged' =~ /(.*)/; # Set $1 if ( $_ !~ /(video)/ ) { print "true -> $1\n"; } else { print "false -> $1\n"; } }
false -> video true -> unchanged

When the match succeeds and the expression returns false, $1 is set.
When the match fails and the expression returns true, $1 remains unchanged.

Update: And finally, a solution to your problem

foreach my $link ( '<a href="/story/43480/">The Bottled Water Lie</a>', '<a href="/story/video/43480/">The Bottled Water Lie</a>', ) { my ($url, $title) = $link =~ m{href="(.+)">(.+)</a>} or next; $url =~ /video/ and next; print("$url: $title\n"); }

or

foreach my $link ( '<a href="/story/43480/">The Bottled Water Lie</a>', '<a href="/story/video/43480/">The Bottled Water Lie</a>', ) { my ($url, $title) = $link =~ m{href="((?:(?!video).)+)">(.+)</a>} or next; print("$url: $title\n"); }

Replace the print with whatever you want.


Comment on Re: Extraneous behaviour of match variables
Select or Download Code
Re^2: Extraneous behaviour of match variables
by explorer (Chaplain) on Nov 03, 2006 at 13:34 UTC

    Thanks, ikegami, for illumination.

    A question more. Is better ((?:(?!video).)+) that ((?:(?<!video).)+) ?

      Better might be to use an HTML parser (or something like HTML::LinkExtor) and simplify what you have to look at.

      (?:(?<!video).)+ is wrong.

      for my $re ( qr/"(?:(?!video).)+"/, qr/"(?:(?<!video).)+"/, qr/"(?:.(?<!video))+"/, ) { print("$re\n"); for ( '"...video..."', '"...video"', '"video..."', '"video"', ) { print("$_: ", /$re/?1:0, "\n"); } }
      (?-xism:"(?:(?!video).)+") "...video...": 0 "...video": 0 "video...": 0 "video": 0 (?-xism:"(?:(?<!video).)+") "...video...": 0 "...video": 1 <----- XXX "video...": 0 "video": 1 <----- XXX (?-xism:"(?:.(?<!video))+") "...video...": 0 "...video": 0 "video...": 0 "video": 0

      (?:(?!video).)+ and (?:.(?<!video))+ should be equivalent. You can do benchmarks to be sure.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://582001]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others making s'mores by the fire in the courtyard of the Monastery: (5)
As of 2014-07-30 23:28 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    My favorite superfluous repetitious redundant duplicative phrase is:









    Results (241 votes), past polls