Beefy Boxes and Bandwidth Generously Provided by pair Networks
XP is just a number
 
PerlMonks  

Backtracking problem with .*(?!bar)

by olivierp (Hermit)
on Sep 24, 2003 at 08:16 UTC ( #293797=perlquestion: print w/ replies, xml ) Need Help??
olivierp has asked for the wisdom of the Perl Monks concerning the following question:

I have a file containing records separated by a |.
In order to classify these entries, I first split each line, and match certain fields against a hash of filters.

The filters are loaded from a configuration file at run-time.
I am thus limited to $var[5] =~ /$regex/ (or at least I think so) for my matching.
Is it possible to get something like this to work:

$re[0] = qr (.*(?!system))i; $re[1] = qr (system)i; $v[0] = "oneitem"; $v[1] = "anothersystem"; for $a (@v) { for $r (@re) { print "\nRegex: $r\tValue: $a\n"; if ($a =~ /$r/) { print "$a matches $r\n" ; } } }

As this results in:

Regex: (?-xism:.*(?!system)) Value: oneitem oneitem matches (?-xism:.*(?!system)) Regex: (?-xism:system) Value: oneitem Regex: (?-xism:.*(?!system)) Value: anothersystem anothersystem matches (?-xism:.*(?!system)) Regex: (?-xism:system) Value: anothersystem anothersystem matches (?-xism:system)

Which is not what I want...
Is it possible to construct a regex that "matches if doesn't contain" ?

Comment on Backtracking problem with .*(?!bar)
Select or Download Code
Re: Backtracking problem with .*(?!bar)
by Abigail-II (Bishop) on Sep 24, 2003 at 09:10 UTC
    Is it possible to construct a regex that "matches if doesn't contain" ?

    Most certainly:

    /^(?:(?!system).)*$/

    Abigail

Re: Backtracking problem with .*(?!bar)
by BrowserUk (Pope) on Sep 24, 2003 at 09:19 UTC

    Or somewhat more efficiently,

    print "Doesn't contain 'system'" if $string =~ m[^(?!.*system)];

    Examine what is said, not who speaks.
    "Efficiency is intelligent laziness." -David Dunham
    "When I'm working on a problem, I never think about beauty. I think only how to solve the problem. But when I have finished, if the solution is not beautiful, I know it is wrong." -Richard Buckminster Fuller
    If I understand your problem, I can solve it! Of course, the same can be said for you.

      Many thanks to both of you !
      This had me flummoxed for quite some time.

      For my understanding:
      Abigail's answer reads
      Beginning of line, followed any number of times by anything that isn't preceded by "system",
      follwed by an end of line marker

      and BrowserUK's reads:
      Beginning of line not followed by anything that is followed by "system"


      In both cases, I assume that it is the presence of begin/end line qualifiers that stops the regex
      from backtracking and matching any as .*(?!system) does.
      Shame an Initiate can't at least cast one vote :(
        In both cases, I assume that it is the presence of begin/end line qualifiers that stops the regex from backtracking and matching any as .*(?!system) does.
        No, they ask for different things. It's not the anchor that is helping.

        The regex /^.*(?!system)/ says "does there exist some number of characters which is not immediately followed by 'system'?". And yes, there are many such solutions in strings such as "infosystem". The letter "i" is not followed by system. The letters "in" are not followed by system. The letters "inf" are not followed by system, and so on.

        However, /^(?!.*system)/ says "starting from the beginning of the string, do I fail to match any sequence of characters followed by 'system'?". So, if the inner match fails, the outer match succeeds, and that's exactly what we want. Not that without the anchor, the outer start point could move all the way past the "s" in system, and then the inner match would fail causing the outer match to succeed. So the anchor is still needed.

        -- Randal L. Schwartz, Perl hacker
        Be sure to read my standard disclaimer if this is a reply.

Re: Backtracking problem with .*(?!bar)
by vbrtrmn (Pilgrim) on Sep 24, 2003 at 13:34 UTC

    I have found people don't like to use the .*, I try to use it as infrequently as possible. Maybe one of these would help you out

    .{0,} - Match 0 or more times. .?? - Match 0 or 1 times (as few as possible).

    --
    paul

      Could I get an example from you on when using the ?? quantifier would be appropriate? I find this a strange example to show as an alternative to the infamous dot star.
        Except that .* is better for golf!

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://293797]
Approved by jonnyfolk
Front-paged by Enlil
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others about the Monastery: (3)
As of 2014-12-27 09:56 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    Is guessing a good strategy for surviving in the IT business?





    Results (176 votes), past polls