Negating Regexes: Tips, Tools, And Tricks Of The Trade

Limbic~Region has asked for the wisdom of the Perl Monks concerning the following question:

All,
A few nights ago on #perl (freenode), someone asked how to write a regex that matched everything but /$some_regex/. The obvious solution of $thing !~ /$some_regex/; was given. The person then explained that this was for WWW::Mechanize find_all_links() method. The interface looks like:

$mech->find_all_links(text_regex => qr/download/i);
[download]

This can be worked around without needing to negate the regex. Just post process all links with !~ yourself. Another alternative would be to propose a patch to Andy. In any case, it got me to thinking about how to negate regexes. I know that it is rare that you might have to do this, but it seems like it might be useful knowledge if for no other reason then to understand how regexes work better.

Originally, I was planning on writing a tutorial. Then I remembered that I stink at regexes and realized that even simple regexes can be hard to negate:

/\d/         # contains a digit
/^\D*$/      # doesn't contain a digit

/[abc]/      # contains either the letter a, b, or c
/^[^abc]*$/  # doesn't contain a, b, or c

/foo|bar/    # contains foo or bar
???          # doesn't contain foo or bar
[download]

So what tools, tips, and tricks of the trade do you have to share for these rare occassions you need to write a regex that matches everything another regex doesn't?

Cheers - L~R

Comment on Negating Regexes: Tips, Tools, And Tricks Of The Trade Select or Download Code

Replies are listed 'Best First'.
Re: Negating Regexes: Tips, Tools, And Tricks Of The Trade by Sidhekin (Priest) on Dec 07, 2006 at 13:58 UTC
So what tools, tips, and tricks of the trade do you have to share for these rare occassions you need to write a regex that matches everything another regex doesn't? `/^(?!(?s:.)$regex)/` :-) Caveat: Only useful in a boolean context. `print "Just another Perl ${\(trickster and hacker)},"` The Sidhekin proves* Sidhe did it!	[reply] [d/l] [select]
Re^2: Negating Regexes: Tips, Tools, And Tricks Of The Trade by ikegami (Patriarch) on Dec 07, 2006 at 16:33 UTC
Very good, but that can't be embedded into another regexp nicely. The equivalent to `/[^$chars]/` is `/(?:(?!$re).)/`. `/[^$chars]/` matches as many characters as possible, as long as none match the character class. `/(?:(?!$re).)/` matches as many characters as possible, as long as no subsequence of them match the regexp. A "few" example uses: Re^3: reg ex NOT, Re^3: Text::Balanced with nested / custom brackets, Re^2: regex for negating a sequence, Re: Regexp: Match anything except a certain word, Re^2: Regexp: Match anything except a certain word, Re: regex help please, Re: Extraneous behaviour of match variables, Re: How to split into paragraphs?, Re^3: How to split into paragraphs?, Re: text extraction question, etc. Notes: Keep in mind that both expressions can sucessfully match 0 characters if not properly anchored. `` can be replaced with other modifiers, like `+`, `?`, `{2}`, etc. This should be in the docs since it's a FAQ. I think it would be nice if `/(?:(?!$re).)/` could be shortcutted to `/(?^$re)/`. It would be more readable, and it would help prevent the commmon misuse of negative lookahead (`/(?!$re.)/`). Updated: Fixed formatting, added stuff	[reply] [d/l] [select]
Re^2: Negating Regexes: Tips, Tools, And Tricks Of The Trade by Limbic~Region (Chancellor) on Dec 07, 2006 at 16:08 UTC
Sidhekin, Very Nice! I admit that look around assertions, non-capturing clusters, and local modifiers are not my strong suit so I learned a bit from this one-size-fits-all solution. It would be nice if you forgot you knew how to do that though and shared other examples. Cheers - L~R	[reply]
Re: Negating Regexes: Tips, Tools, And Tricks Of The Trade by imp (Priest) on Dec 07, 2006 at 14:34 UTC
I was experimenting with this and came up with the following . `use strict; use warnings; my $pattern = qr/abc/; my $negative = qr< (??{ /$pattern/ ? qr/\A$/ : qr//; }) >x; my $text = 'abcdef'; if ($text =~ /$negative/) { print "matched $text\n"; }` [download] This strategy uses (??{ code }) to evaluate the current string against the regex you wish to negate. If the pattern matches it returns a regex that matches an empty string. Otherwise it returns a regex that matching anything. I'm not sure how safe this strategy is though. Can someone who knows more about perl's regex engine comment on whether the above is appropriate?	[reply] [d/l]
Re: Negating Regexes: Tips, Tools, And Tricks Of The Trade by diotalevi (Canon) on Dec 07, 2006 at 16:24 UTC
I'd prefer to use this instead of Sidhekin's because it lets your pattern start it's match normally and doesn't require retrying at ever offset. The following lets the pattern match however it normally would but fails when it succeeds. `/(?(??{...})(?!))/;` In Perl 5.10, this is especially nice [Updated: Oops. Forgot the (?:...\|) to have a success branch] ~~`/...(COMMIT)(?!)/;`~~`/(?:...(COMMIT)(?!)\|)/` ⠤⠤ ⠙⠊⠕⠞⠁⠇⠑⠧⠊	[reply] [d/l] [select]
Re^2: Negating Regexes: Tips, Tools, And Tricks Of The Trade by demerphq (Chancellor) on Dec 07, 2006 at 17:33 UTC
I'm not so sure that your example works as it stands. You'd need a code block to distinguish between true failure and false failure. In perl 5.10 you could do it like: `if ( ! /...(COMMIT:x)(FAIL)/ && $REGERROR ne 'x' ) { ... }` [download] To be honest I need to think about how $REGERROR will work in the context of complete failure. For instance when failure occurs because of the optimiser. Hmm. --- $world=~s/war/peace/g	[reply] [d/l]
Re: Negating Regexes: Tips, Tools, And Tricks Of The Trade by jbert (Priest) on Dec 07, 2006 at 15:20 UTC
Is there anyone here with sufficient computer science skillz to answer the question whether it is possible to write a program which will take an arbitrary regexp and contruct another which will act as it's negation against all possible input strings? I would imagine you'd have to restrict the definition of "regular expression" to something a little less rich than the full perl set (isn't there a compsci definition?). Presumably if regexps form a turing complete language then the answer is no, because this sounds awfully like such a program would violate the the Halting Problem (but maybe not - I haven't thought about it in detail).	[reply]
Re^2: Negating Regexes: Tips, Tools, And Tricks Of The Trade by blokhead (Monsignor) on Dec 07, 2006 at 15:43 UTC
whether it is possible to write a program which will take an arbitrary regexp and contruct another which will act as it's negation against all possible input strings? A program to do it? Sure! An efficient program? No, at least in the classical regex sense. To negate a regex, you convert it to an NFA to a DFA, complement the DFA (invert accept/reject states), and convert that back to a regex. This is basic stuff from a first course in CS theory. The problem is that this is really inefficient. The NFA->DFA step introduces an exponential blowup in size. Even the special case of deciding whether the negation of a regex is the empty regex (the regex that accepts nothing) is PSPACE-complete (that means it's bad), let alone trying to compute more arbitrary regex negations. That aside, I've been working with someone else on a suite of modules for dealing with regular languages & finite automata that will support negations in exactly this way, if you'd ever want to see how it actually goes. It will eventually allow standard Perl regexes as input as well, but of course it will be very slow for moderately-sized regexes. Even still, I wouldn't recommend such a module for everyday use -- it would be much simpler to rewrite the logic surrounding the regex, or use one of the tricks mentioned above in this thread, like negative lookahead. I would imagine you'd have to restrict the definition of "regular expression" to something a little less rich than the full perl set (isn't there a compsci definition?). Yes, the classical CS definition allows simply the "\|" (alternation), "" (repetition), and concatenation operators. No backrefs as in Perl, no lookaheads, and certainly no embedded Perl code ;) Presumably if regexps form a turing complete language ...* The expressibility of classical regexes is as far from Turing-complete as we know how to get ;) Extending them to include backreferences at least gives them the expressibility of NP, but they are still not Turing-complete. blokhead	[reply]
Re^3: Negating Regexes: Tips, Tools, And Tricks Of The Trade by jbert (Priest) on Dec 07, 2006 at 15:53 UTC
Cool. Thanks very much for this. I was picturing some kind of repeating search-and-replace regexp thing, using the string as a tape, emulating a turing machine. Of course, that's replacement as well but there are also probably a million other reasons why that wouldn't work. Replies like this are one reason Why Perl Monks Works for Me.	[reply]
Re^4: Negating Regexes: Tips, Tools, And Tricks Of The Trade by Anonymous Monk on Dec 27, 2014 at 03:27 UTC
Re^2: Negating Regexes: Tips, Tools, And Tricks Of The Trade by geekphilosopher (Friar) on Dec 07, 2006 at 19:17 UTC
Regular expressions, at least using the computer science definition, are equivalent in expressive power to the regular langauges, hence their name. This means they can be defined in terms of a deterministic or non-deterministic finite automata. Adding a stack would give us a push-down automata, which can recognize context-free languages. Adding a second stack gives us something equivalent in power to a Turing machine.	[reply]
Re^3: Negating Regexes: Tips, Tools, And Tricks Of The Trade by jbert (Priest) on Dec 07, 2006 at 21:38 UTC
Thanks. I have a limited CS background (some register machines, recursive and primitive recursive functions) and this gives me quite a few pointers to picking up some more.	[reply]
Re: Negating Regexes: Tips, Tools, And Tricks Of The Trade by geekphilosopher (Friar) on Dec 07, 2006 at 14:40 UTC
I suppose `my $regexp = /something_here/; do_stuff() unless $foo =~ $regexp;` [download] is cheating, eh? ;)	[reply] [d/l]
Re^2: Negating Regexes: Tips, Tools, And Tricks Of The Trade by imp (Priest) on Dec 07, 2006 at 14:51 UTC
That approach is valid, but not appropriate for this context because the regex in question is being passed to WWW::Mechanize as follows: `$mech->find_all_links(text_regex => qr/download/i);` [download]	[reply] [d/l]
Re: Negating Regexes: Tips, Tools, And Tricks Of The Trade by eyepopslikeamosquito (Archbishop) on Dec 08, 2006 at 12:01 UTC
This is discussed in the Perl Cookbook recipe 6.18 "Expressing AND, OR, and NOT in a Single Pattern".	[reply]
Re: Negating Regexes: Tips, Tools, And Tricks Of The Trade by Anonymous Monk on Oct 12, 2011 at 20:05 UTC
FYI: My 1999 revised First edition has it as recipe 6.17.	[reply]
Re: Negating Regexes: Tips, Tools, And Tricks Of The Trade by Sartak (Hermit) on Dec 09, 2006 at 02:08 UTC
It's interesting that you ask this now. Just yesterday I was developing an IRC bot and a question like this came up; basically, how to write the negation of a regex. The way I solved it was to add a new flag (not to Perl, but to the regex the users write), `/r`, that negates the return value of the match. So a user can do `!grep /^ascended$/r` to match all entries that don't match exactly "ascension". I haven't run into any major problems yet. :) I only match in boolean context (and without using any capturing variables).. I don't know if it'd work so well if I needed to do anything more complicated.	[reply] [d/l] [select]
Re^2: Negating Regexes: Tips, Tools, And Tricks Of The Trade by ikegami (Patriarch) on Dec 09, 2006 at 05:53 UTC
How's `grep /^ascended$/r` different from `grep !/^ascended$/`? Update: Nevermind, I understand. The user submits `/^ascended$/r`, which you transform to `!/^ascended$/` to execute. I thought you had patched Perl :)	[reply] [d/l] [select]

Back to Seekers of Perl Wisdom