Beefy Boxes and Bandwidth Generously Provided by pair Networks
"be consistent"
 
PerlMonks  

Negative Lookahead Assertion Strangness

by sas (Initiate)
on May 15, 2006 at 16:22 UTC ( [id://549538]=perlquestion: print w/replies, xml ) Need Help??

sas has asked for the wisdom of the Perl Monks concerning the following question:

I wanted a single regex to match lines containing "bar" if and only if they do not contain "foo" and happened upon some strange behavior
my $str=" some stuff then foo then bar then more stuff"; print "string=\"$str\"\n"; if ($str =~ /(?!.*foo)(^.*bar.*)/) {print "1 matched \"$1\"\n";} if ($str =~ /(?!^.*foo)(^.*bar.*)/) {print "2 matched \"$1\"\n";} if ($str =~ /(?!.*foo)(.*bar.*)/) {print "3 matched \"$1\"\n";} if ($str =~ /(?!^.*foo)(.*bar.*)/) {print "4 matched \"$1\"\n";} $str=" some stuff then bar then more stuff"; print "string=\"$str\"\n"; if ($str =~ /(?!.*foo)(^.*bar.*)/) {print "5 matched \"$1\"\n";} if ($str =~ /(?!^.*foo)(^.*bar.*)/) {print "6 matched \"$1\"\n";} if ($str =~ /(?!.*foo)(.*bar.*)/) {print "7 matched \"$1\"\n";} if ($str =~ /(?!^.*foo)(.*bar.*)/) {print "8 matched \"$1\"\n";}
for me (Perl v5.8.6 linux) it produces the following output;
string=" some stuff then foo then bar then more stuff" 3 matched "oo then bar then more stuff" 4 matched "some stuff then foo then bar then more stuff" string=" some stuff then bar then more stuff" 5 matched " some stuff then bar then more stuff" 6 matched " some stuff then bar then more stuff" 7 matched " some stuff then bar then more stuff" 8 matched " some stuff then bar then more stuff"
I don't know a lot about Perl regex can someone explain why 3 and 4 above matched and why they matched was the did?

Replies are listed 'Best First'.
Re: Negative Lookahead Assertion Strangness
by merlyn (Sage) on May 15, 2006 at 16:34 UTC
    Three says /(?!.*foo)(.*bar.*)/, which means "does there exist the leftmost place in the string where I could look ahead and not match a foo, and yet I can look ahead and match a bar (which will be captured)?" And yes, as soon as we've gone past that "f", it qualifies.

    Four says /(?!^.*foo)(.*bar.*)/), which asks "does there exist the leftmost place in the string where we can look ahead and not see the begining of the string (followed by other stuff), and then some "bar" later (which is captured)?" And yes, as soon as we leave the beginning of the string, we can no longer "look forward to see the beginning of the string". So it matches right after the first char.

    If you are careful about how you read the regex, it'll become clear what it's trying to match and capture. You just have to be careful. :)

    -- Randal L. Schwartz, Perl hacker
    Be sure to read my standard disclaimer if this is a reply.

      YAPE::Regex::Explain verbosely telling you what a regex is saying may help if you still can't see why something's matching (or not).

Re: Negative Lookahead Assertion Strangness
by kvale (Monsignor) on May 15, 2006 at 16:37 UTC
    In example 3, the lookahead assertion doesn't match at the beginning of the string, so the regex tries to match at the next position in the string, that fails and so on. This continues until the regex matches at position "oo then..." in which case no foo is found and it succeeds.

    In example 4, the .* will first eat the whole string and then the foo will fail to match, satisfying the negative assertion :). So the whole string is printed.

    Both of these examples show the danger of creating nonlocal assertions. It is easy to do something unexpected when the string tested is some distance from the assertion.

    -Mark

Re: Negative Lookahead Assertion Strangness
by ikegami (Patriarch) on May 15, 2006 at 17:04 UTC

    Update: I used a form which was needless complicated in this particular instance. I struke the related content, even though it's not technically wrong. In exchange, I added the new form. The new content is enclosed by bold parens.

    Things to know:

    • /$re1/ && /$re2/
      can be written as
      /^(?=.*$re1)(?=.*$re2)/s
      or as
      /^(?=.*$re1).*$re2/s

    • (?:(?!$re).)
      is to regexps as
      [^$chars]
      is to characters.

      Keep in mind
      'abc' =~ /[^a]/s matches at pos 1
      'abc' =~ /^[^a]*$/s does not match
      'abc' =~ /(?:(?!ab).)/s matches at pos 1
      'abc' =~ /^(?:(?!ab).)*$/s does not match

    • (
      Similarly
      !/$re1/s && /$re2/s
      can be written as
      /^(?!.*$re1)(?=.*$re2)/s
      or as
      /^(?!.*$re1).*$re2/s
      )

    Any of the following do what you want

    • /bar/ && !/foo/
    • /bar/ && /^(?:(?!foo).)*$/s
    • /^(?=.*bar)(?=(?:(?!foo).)*$)/s
    • /^(?=.*bar)(?:(?!foo).)*$/s
    • /^(?=(?:(?!foo).)*$).*bar/s
    • ( /^(?=.*bar)(?!.*foo)/ )
    • ( /^(?!.*foo)(?=.*bar)/ )
    • ( /^(?!.*foo).*bar/ )
Re: Negative Lookahead Assertion Strangness
by TedPride (Priest) on May 15, 2006 at 19:12 UTC
    For the sake of simplicity, why not do the following?
    while (<DATA>) { print if m/bar/ && !m/foo/; } __DATA__ some stuff then foo then bar then more stuff oo then bar then more stuff some stuff then foo then bar then more stuff some stuff then bar then more stuff
Re: Negative Lookahead Assertion Strangness
by MonkE (Hermit) on May 15, 2006 at 17:17 UTC
    This regular expression should do what sas asked: /^(?!.*foo).*bar/
    Update: I see that ikegami has a much more throrough treatment of the issue (above).

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://549538]
Approved by moklevat
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others admiring the Monastery: (5)
As of 2024-04-24 05:39 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found