Beefy Boxes and Bandwidth Generously Provided by pair Networks
Keep It Simple, Stupid
 
PerlMonks  

Re: Negating Regexes: Tips, Tools, And Tricks Of The Trade

by jbert (Priest)
on Dec 07, 2006 at 15:20 UTC ( [id://588356]=note: print w/replies, xml ) Need Help??


in reply to Negating Regexes: Tips, Tools, And Tricks Of The Trade

Is there anyone here with sufficient computer science skillz to answer the question whether it is possible to write a program which will take an arbitrary regexp and contruct another which will act as it's negation against all possible input strings?

I would imagine you'd have to restrict the definition of "regular expression" to something a little less rich than the full perl set (isn't there a compsci definition?).

Presumably if regexps form a turing complete language then the answer is no, because this sounds awfully like such a program would violate the the Halting Problem (but maybe not - I haven't thought about it in detail).

Replies are listed 'Best First'.
Re^2: Negating Regexes: Tips, Tools, And Tricks Of The Trade
by blokhead (Monsignor) on Dec 07, 2006 at 15:43 UTC
    whether it is possible to write a program which will take an arbitrary regexp and contruct another which will act as it's negation against all possible input strings?
    A program to do it? Sure! An efficient program? No, at least in the classical regex sense. To negate a regex, you convert it to an NFA to a DFA, complement the DFA (invert accept/reject states), and convert that back to a regex. This is basic stuff from a first course in CS theory. The problem is that this is really inefficient. The NFA->DFA step introduces an exponential blowup in size. Even the special case of deciding whether the negation of a regex is the empty regex (the regex that accepts nothing) is PSPACE-complete (that means it's bad), let alone trying to compute more arbitrary regex negations.

    That aside, I've been working with someone else on a suite of modules for dealing with regular languages & finite automata that will support negations in exactly this way, if you'd ever want to see how it actually goes. It will eventually allow standard Perl regexes as input as well, but of course it will be very slow for moderately-sized regexes. Even still, I wouldn't recommend such a module for everyday use -- it would be much simpler to rewrite the logic surrounding the regex, or use one of the tricks mentioned above in this thread, like negative lookahead.

    I would imagine you'd have to restrict the definition of "regular expression" to something a little less rich than the full perl set (isn't there a compsci definition?).
    Yes, the classical CS definition allows simply the "|" (alternation), "*" (repetition), and concatenation operators. No backrefs as in Perl, no lookaheads, and certainly no embedded Perl code ;)
    Presumably if regexps form a turing complete language ...
    The expressibility of classical regexes is as far from Turing-complete as we know how to get ;) Extending them to include backreferences at least gives them the expressibility of NP, but they are still not Turing-complete.

    blokhead

      Cool. Thanks very much for this. I was picturing some kind of repeating search-and-replace regexp thing, using the string as a tape, emulating a turing machine. Of course, that's replacement as well but there are also probably a million other reasons why that wouldn't work.

      Replies like this are one reason Why Perl Monks Works for Me.

        Actually, it's entirely possible that what you describe is Turing-complete. The fact that a regex search and replace is one computing step doesn't place any particular limitation on the expressive power of your scheme. The same is true for an LR parser, which repeatedly uses a finite automaton to find handles. Finite automata can only parse regular languages, yet LR parsers can cope with any language parseable by a deterministic finite automaton, which is a strictly larger set. (For example, it includes the language of expressions with matching parentheses, which is not a regular language.)
Re^2: Negating Regexes: Tips, Tools, And Tricks Of The Trade
by geekphilosopher (Friar) on Dec 07, 2006 at 19:17 UTC
    Regular expressions, at least using the computer science definition, are equivalent in expressive power to the regular langauges, hence their name. This means they can be defined in terms of a deterministic or non-deterministic finite automata. Adding a stack would give us a push-down automata, which can recognize context-free languages. Adding a second stack gives us something equivalent in power to a Turing machine.
      Thanks. I have a limited CS background (some register machines, recursive and primitive recursive functions) and this gives me quite a few pointers to picking up some more.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://588356]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others musing on the Monastery: (5)
As of 2024-03-19 11:32 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found