http://www.perlmonks.org?node_id=369307

perlgags78 has asked for the wisdom of the Perl Monks concerning the following question:

Hi folks, I amended my code as follows
if ($line =~ /(?:(?!throw).)/ ) { debug ("Doesn't contain throw"); }
yet it still printed the statement for the line        //throw OtherException I've put '.*' at the start and end of the reg exp but it still doesn't see the throw in the throw line. Thanks, Mark.

Replies are listed 'Best First'.
Re: Can't match negated words.
by Abigail-II (Bishop) on Jun 24, 2004 at 12:44 UTC
    If you just want to match a line where a particular word doesn't appear in, !~ does the trick. But if "not this word" is part of a regular expression, the !~ will not do it. Instead, you basically have to "progress carefully", looking ahead on each step on your way. That is, match a character (any character) after you've concluded it doesn't start a forbidden word. If no character in the (sub)string you match doesn't start a forbidden word, no forbidden words will be in the matched string. How do you check? Use negative lookahead:
    /^(?:(?!throw).)*$/s

    Abigail

      Hi folks, Thanks for all the replies!! This is great. I'm used to forums that take ages or never get back to you.. Abigail, ye've nailed it. That's exactly what I'm looking for, where "not throw" is part of the expression. I'm not familiar with the look ahead stuff but I've a couple of places with examples and I'll look at them. Incidentally are there any online resources you guys could recommend? Thanks again everyone, Mark.
Re: Can't match negated words.
by Fletch (Bishop) on Jun 24, 2004 at 12:26 UTC

    You need to reread perldoc perlre. You've got a negated character class that's looking for one character that's not one of the ones listed. What you're really wanting is !~ to negate the sense of the search.

Re: Can't match negated words.
by rinceWind (Monsignor) on Jun 24, 2004 at 12:28 UTC
    [^throw] matches a single character other than t, h, r, o or w. I don't think this is what you mean.

    You are better off using !~, as regexp matching is positive matching rather than negative matching.

    --
    I'm Not Just Another Perl Hacker

Re: Can't match negated words.
by kesterkester (Hermit) on Jun 24, 2004 at 12:29 UTC

    Hi Mark--

    The /[^throw]/ regexp isn't doing what you think it's doing-- you've created a character class (with the square brackets) containing the letters 't', 'h', 'r', 'o', and 'w', and are matching on anything that doesn't contain those (the ^ negates a character class. So if the throw line in your Java contains ANY characters that aren't t,h,r,o, or w, it'll match. This is why you're getting unexpected behaivior.

    You're on the right track with using !~. That should do what you mean.

    Try running "perldoc perlre" on your local system for a good intro to this type of thing.

Re: Can't match negated words.
by perlgags78 (Acolyte) on Jun 24, 2004 at 14:58 UTC
    Hi folks, Thanks for pointing that out Hugo. I'm currently trying to extend it so that for the line
    /* sdfthrow OtherException
    prints 'Opening extended comment' but does not for the line
    throw new Exception() /* asdfasdf
    Basically so long as the '/*' hasn't been preceeded with a throw then it should print out opening the comment. I've got the following expression in place.
    if ($line =~ /^.*\/\*((?:(?<!throw)).*$/ ) { debug("Opening extended comment"); }
    Is my understanding this expression correct? Would folks mind if I explain what I think is going on from left-right?

    /^ matches the start of the $line string

    .* says that the start can be proceeded by any number of characters

    \/\* matches the '/*' string

    (?:(?<!throw)) means so long as /* isn't preceeded in the string by the word throw $line still matches

    .*$ means that any number of characters can proceed the /* upto the end of the line

    My understanding's obviously incorrect cos eh.. it don't work for a lad. Any help would be greatly appreciated, Mark.

      Regexes match from left to right. So, if you first say "match /*", and then say, "don't match throw", it's *not* going to exclude "/*"'s that are preceeded with "throw". In fact, you haven't given any requirements for what should preceed "/*".

      Ignoring the fact that you don't specify what should happen on a line like this:

      /* throw /*
      you might want to use something like:
      m!^(?:(?!throw).)*/[*]!s
      although that can probably be optimized (and the more you know about where you are going to match against, the more possibilities for optimizing there are).

      Abigail

        Hi Abigail,

        I'd like to match the first '/*' and check that there's no

        throw declared before it.

        I think it should look something like this?

        m!^/[*].*?(?:(?!throw).)*/[*]!s

        I can't really understand what the '.)*' means after the throw?

        You could explain it briefly could you? Thanks,

        Mark.

      Putting dot-star next to an anchor is pointless. Just throw out the anchor and the dot-star. That leaves you with:
      /\/\*((?:(?<!throw))/
      You've got capturing parentheses around non-capturing parentheses, around a negative lookbehind. You only need the parens for the negative lookbehind:
      /\/\*(?<!throw)/
      Ok, now you've matched "/*", and at that point, you're looking back to ensure that what comes before you isn't "throw". It can't be, because it ends in "/*". You can't really check everything up to the "/*" with a negative lookbehind, because negative lookbehinds can't be variable-length, and your line can be. You can do it with negative lookahead:
      if ($line =~ /^(?:(?!throw).)*?\/\*/)
      That will be any number of characters that isn't the start of "throw", followed by "/*". The *? makes it take the first "/*" rather than the last.

      Please see this node about YAPE::Regex::Explain for a helpful module.


      We're not really tightening our belts, it just feels that way because we're getting fatter.
Re: Can't match negated words.
by perlgags78 (Acolyte) on Jun 24, 2004 at 13:28 UTC
    Hi folks, I may have screwed my original post up. I wanted to simply reply but instead I seem to have deleted the original message. I amended my code as follows
    if ($line =~ /(?:(?!throw).)/ ) { debug ("Doesn't contain throw"); }
    yet it still printed the statement for the line
    //throw OtherException
    I've put '.*' at the start and end of the reg exp but it still doesn't see the throw in the throw line. Thanks, Mark.

      First, You haven't duplicated Abigail-II's expression correctly; if you do so, it will work correctly:

      if ($line =~ /^(?:(?!throw).)*$/s) { debug ("Doesn't contain throw"); }

      An alternative formulation, which I find slightly cleaner, is to recast it as a single negative lookahead:

      if ($line =~ /^(?!.*?throw)/) { ... }

      Hugo

        hi Hugo, I see that you've a ? before the throw. What function has this? Is the '?' associated with the .* or the throw? Also I'm having hassle getting return characters to appear in my posts so they kinda look like one line posts. Any ideas? Thanks, Mark.

Re: Can't match negated words.
by perlgags78 (Acolyte) on Jun 24, 2004 at 13:50 UTC
    Hi folks, I misinterpretted Abigail's original code suggestion and as soon as I swapped it in it worked fine. Thanks again for that Abigail. Thanks, Mark.
Re: Can't match negated words.
by perlgags78 (Acolyte) on Jun 24, 2004 at 12:53 UTC
    Hi folks, Thanks for all the replies!! This is great. I'm used to forums that take ages or never get back to you.. Abigail, ye've nailed it. That's exactly what I'm looking for, where "not throw" is part of the expression. I'm not familiar with the look ahead stuff but I've a couple of places with examples and I'll look at them. Incidentally are there any online resources you guys could recommend? Thanks again everyone, Mark.
Re: Can't match negated words.
by perlgags78 (Acolyte) on Jun 24, 2004 at 17:17 UTC
    Can anyone explain what is mean by clustering and capturing?
    Or even the difference between them?
    I'm reading the docs and came across this.
    This is for clustering, not capturing; it groups
    subexpressions like "()", but doesn't make backreferences
    as "()" does. So
    @fields = split(/\b(?:a|b|c)\b/)

    is like

    @fields = split(/\b(a|b|c)\b/)

    Thanks,
    Mark.
      Clustering is grouping, like in an algebraic expression. Parentheses limit how far back and forward an alternator (vertical bar) applies:
      /foo|bar/; # Matches "foo" or "bar" /fo(o|b)ar/;# Matches "fooar" or "fobar"
      Grouping also allows quantifiers to apply to more than one atom:
      /foo{3}/ # Matches "foooo" /(foo){3}/ # Matches "foofoofoo"
      Capturing is storing the parenthesized portion of the match somewhere that you can refer back to it (as $1, or as an element of the list returned by a match, for example). Ordinary parentheses are capturing parentheses. Special parentheses (any that have a ? after the opening paren) are non-capturing. All parentheses group their contents.

      We're not really tightening our belts, it just feels that way because we're getting fatter.