Beefy Boxes and Bandwidth Generously Provided by pair Networks
Pathologically Eclectic Rubbish Lister
 
PerlMonks  

comment on

( [id://3333]=superdoc: print w/replies, xml ) Need Help??

We all know how to match "this or that"-

/this|that/

And we can match "this and not that" simply-

/this/ && !/that/

Once in a while I'm faced with a library that wants me to provide a regexp, and I need it to test a "this and not that" pair of patterns. (If it's my own code, it will accept a coderef like sub {/this/ && !/that/} as a condition, and all is well...) One easy-to-explain workaround is to use a code-conditional pattern:

m:^(?(?{/this/ && !/that/})|(?!)):

or, more verbosely,

m: # use : as delims so / in code won't hurt ^ # Anchor it at start, don't need to backtrack (? # Start a conditional test (?{/this/ && !/that/}) # What to test- # By itself, a code assertion always succeeds, # That's why we wrap the code inside a conditional |(?!) # Simplest "T|F" - match succeed, nonmatch fails ) # End the condition :x

And it works. Still, as I was settling down one evening a few nights ago, I got to thinking of alternatives. We can use look-ahead assertions:

/(?=.*this)(?!.*that)/s

which is fairly easy for me to read. It doesn't need an anchor, the sub-patterns will cause a match or fail at the first position- in fact the anchor slows it down since it is a useless extra operation in this regexp. The s modifier at the end is so the .* won't stop at a line break.

Then I had a thought going back to alternation: /this|that/, and a class I had on logic. Another way of saying "this and that" is "not(not this or not that)". Which leads to

/(?!(?!.*this)|(?=.*that))/s

There might be a better way of expressing it, but my time is limited...

Speaking of time, I tossed them all into a benchmarking script, and ran them on the machine I'm typing from which runs perl 5.12.1, and am a little surprised by the results.

#!/usr/bin/env perl use Benchmark 'cmpthese'; my %ways = ( '3calls' => sub {/a/ && !/b/ && /c/}, code => sub {m:^(?(?{/a/ && !/b/ && /c/})|(?!)):}, pos_look => sub {/(?=.*a)(?!.*b)(?=.*c)/s}, neg_look => sub {/(?!(?!.*a)|(?=.*b)|(?!.*c))/s}, neg_anch => sub {/^(?!(?!.*a)|(?=.*b)|(?!.*c))/s} ); # Minimal check that the regexps do what they're supposed to while (my ($way, $sub)=each %ways) { die "$way failed to match\n" unless ($_ = 'cat') && &$sub; die "$way had a false positive\n" if ($_= 'cab') && &$sub; } print "For matching:\n"; $_="find your cat"; cmpthese(-1, \%ways); print "For non-matching:\n"; $_="don't find a cab"; cmpthese(-1, \%ways); __END__ Rate code neg_look neg_anch pos_anch pos_look 3calls code 674093/s -- -52% -53% -63% -63% -87% neg_look 1417081/s 110% -- -1% -21% -22% -73% neg_anch 1429846/s 112% 1% -- -21% -21% -73% pos_anch 1798791/s 167% 27% 26% -- -1% -66% pos_look 1810887/s 169% 28% 27% 1% -- -66% 3calls 5338256/s 692% 277% 273% 197% 195% -- For non-matching: Rate code neg_anch neg_look pos_anch pos_look 3calls code 674093/s -- -55% -57% -63% -65% -88% neg_anch 1495898/s 122% -- -5% -17% -22% -73% neg_look 1570933/s 133% 5% -- -13% -18% -72% pos_anch 1806441/s 168% 21% 15% -- -5% -67% pos_look 1906963/s 183% 27% 21% 6% -- -65% 3calls 5522908/s 719% 269% 252% 206% 190% --

What surprises me is both how much faster the three separate matches 3calls is than any single-matching alternative, and how much slower the "code" regexp is. The quick /a/ && !/b/ is due to the optimizing that the simpler patterns can undergo, but I guess that's dwarfed by the overhead of setting up an eval-inside-a-regexp.

And you, dear readers, have you any other favorite ways of saying "match this and not that" in a single regexp?


In reply to Matching and nonmatching multiple regexps at once by Yary

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post; it's "PerlMonks-approved HTML":



  • Are you posting in the right place? Check out Where do I post X? to know for sure.
  • Posts may use any of the Perl Monks Approved HTML tags. Currently these include the following:
    <code> <a> <b> <big> <blockquote> <br /> <dd> <dl> <dt> <em> <font> <h1> <h2> <h3> <h4> <h5> <h6> <hr /> <i> <li> <nbsp> <ol> <p> <small> <strike> <strong> <sub> <sup> <table> <td> <th> <tr> <tt> <u> <ul>
  • Snippets of code should be wrapped in <code> tags not <pre> tags. In fact, <pre> tags should generally be avoided. If they must be used, extreme care should be taken to ensure that their contents do not have long lines (<70 chars), in order to prevent horizontal scrolling (and possible janitor intervention).
  • Want more info? How to link or How to display code and escape characters are good places to start.
Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others making s'mores by the fire in the courtyard of the Monastery: (6)
As of 2024-03-29 14:08 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found