comment on

whether it is possible to write a program which will take an arbitrary regexp and contruct another which will act as it's negation against all possible input strings?

A program to do it? Sure! An efficient program? No, at least in the classical regex sense. To negate a regex, you convert it to an NFA to a DFA, complement the DFA (invert accept/reject states), and convert that back to a regex. This is basic stuff from a first course in CS theory. The problem is that this is really inefficient. The NFA->DFA step introduces an exponential blowup in size. Even the special case of deciding whether the negation of a regex is the empty regex (the regex that accepts nothing) is PSPACE-complete (that means it's bad), let alone trying to compute more arbitrary regex negations.

That aside, I've been working with someone else on a suite of modules for dealing with regular languages & finite automata that will support negations in exactly this way, if you'd ever want to see how it actually goes. It will eventually allow standard Perl regexes as input as well, but of course it will be very slow for moderately-sized regexes. Even still, I wouldn't recommend such a module for everyday use -- it would be much simpler to rewrite the logic surrounding the regex, or use one of the tricks mentioned above in this thread, like negative lookahead.

I would imagine you'd have to restrict the definition of "regular expression" to something a little less rich than the full perl set (isn't there a compsci definition?).

Yes, the classical CS definition allows simply the "|" (alternation), "*" (repetition), and concatenation operators. No backrefs as in Perl, no lookaheads, and certainly no embedded Perl code ;)

Presumably if regexps form a turing complete language ...

The expressibility of classical regexes is as far from Turing-complete as we know how to get ;) Extending them to include backreferences at least gives them the expressibility of NP, but they are still not Turing-complete.

blokhead

In reply to Re^2: Negating Regexes: Tips, Tools, And Tricks Of The Trade by blokhead
in thread Negating Regexes: Tips, Tools, And Tricks Of The Trade by Limbic~Region

Are you posting in the right place? Check out Where do I post X? to know for sure.
Posts may use any of the Perl Monks Approved HTML tags. Currently these include the following:
<code> <a> <b> <big> <blockquote> <br /> <dd> <dl> <dt> <em> <font> <h1> <h2> <h3> <h4> <h5> <h6> <hr /> <i> <li> <nbsp> <ol> <p> <small> <strike> <strong> <sub> <sup> <table> <td> <th> <tr> <tt> <u> <ul>
Snippets of code should be wrapped in <code> tags not <pre> tags. In fact, <pre> tags should generally be avoided. If they must be used, extreme care should be taken to ensure that their contents do not have long lines (<70 chars), in order to prevent horizontal scrolling (and possible janitor intervention).
Want more info? How to link or How to display code and escape characters are good places to start.


Perl: the Markov chain saw
	PerlMonks