Beefy Boxes and Bandwidth Generously Provided by pair Networks
XP is just a number

regex for negating a sequence

by spurperl (Priest)
on Oct 03, 2006 at 17:56 UTC ( #576137=perlquestion: print w/replies, xml ) Need Help??
spurperl has asked for the wisdom of the Perl Monks concerning the following question:

Hello monks,

How can the following be implemented as a regex ?

\x26\x67 followed by 8 bytes in which the sequence \x26\x67 does not appear.


Replies are listed 'Best First'.
Re: regex for negating a sequence
by davidrw (Prior) on Oct 03, 2006 at 18:10 UTC
    i think this will work for you:
    Based upon this test code to try to match ab followed by 4 bytes w/no ab sequence:
    perl -le 'print for map { /ab(?!ab)(.(?!ab)){2}.{2}/s ? $& : () } @AR +GV' abhjabdf abasdasd abdasdadsab ababblah OUTPUTS: abasda abdasd abblah
    Update: Here's a commented version:
    / \x26\x67 # Starting sequence (let's call it AB) (?!\x26\x67) # Exclude double starting seqeuence, e.g. ABAB (?: # non-capture .(?!\x26\x67) # Any character that's not followed by the sequenc +e ){6} # Need N-2 of these .{2} # And the last two characters can be anything. # note we know they're not AB because otherwise the # (N-2)th item in the previous part wouldn't have mat +ched /sx # Enable comments, and make . mean everything
    Update: Here's a generic version:
    my $marker = '&g'; my $N = 8; my $n1 = $N-2; /$marker(?!$marker)(?:.(?!$marker)){$n1}.{2}/s
    Update: (As pointed out by diotalevi) Added the /s modifier in case the data stream can have a newline in it.

      Your regexp can be simplified to the following:


      If you'd rather have simplicity at the cost of an extra check, use the following:


      is the equivalent of
      but it (negatively) matches entire regexps instead of a choice of characters.

        Another alternative:
        I've interpreted "does not appear" differently than you when the eighth char is the \x26 of a \x26\x67 sequence.
        nice. My first attempt was trying to do it like that, but i started with something like:
        which doesn't work for the case of consectutive sequences.. i just didn't think of putting the look-ahead behind the dot :)

      Your . is excluding the \n character. Use the /s flag in some form if you really mean "any character."

      ⠤⠤ ⠙⠊⠕⠞⠁⠇⠑⠧⠊

Re: regex for negating a sequence
by pizza_milkshake (Monk) on Oct 03, 2006 at 20:11 UTC
    whoops, it's been a while
    #!/usr/bin/perl -wl # ex: set ts=2 et: use strict; use warnings qw(all); $_ = "\x26\x67\x26\x67"."ABCDEFGHIJKLMN"; print $1 while /(\x26\x67(?!\x26\x67).{8})/g;
Re: regex for negating a sequence
by aquarium (Curate) on Oct 03, 2006 at 23:21 UTC
    might be faster to just loop (5 times) through the next two characters in the string and test for equality. lookahead/lookbehind assertions are rather expensive if the string being matched is long
    $matched = undef; for($index=0,$index<11;$index+=2) { if(substr($test_string;$index,2) eq "ab") { $matched = 1; last; } }
    untested code
    the hardest line to type correctly is: stty erase ^H
      lookahead/lookbehind assertions are rather expensive if the string being matched is long

      demerphq might be too shy to toot his own horn, but a couple of weeks he landed an awesome patch to the development version of Perl that teaches the optimiser to look inside those assertions and use the information to improve the matching strategy used. Which equates to some massive speedups in some cases, perhaps the cases to which you were alluding.

      Let's just hope that Perl 5.10 sees the light of day soon. There's lots of goodness in there waiting to be unleashed.

      • another intruder with the mooring in the heart of the Perl

Re: regex for negating a sequence
by monarch (Priest) on Oct 04, 2006 at 15:49 UTC
    I know this is outside the scope of the original question but this is a good opportunity to try your hand at some testing! Sometimes a regexp might look like it will do the job but how can you be SURE it does what you expect?

    Try the following code:

    You can see that I've tried to cover all the cases I can think of in your requirements. Do they look right? Do the test situations describe all the match/nomatch situations you're expecting?

    The test script produces the following output:

    C:\temp>perl -w regexp.t 1..14 ok 1 - lead bytes followed by 8 clear bytes ok 2 - lead bytes followed by only 7 clear bytes ok 3 - lead bytes followed by 9 clear bytes ok 4 - lead bytes with 2 bad bytes ok 5 - lead bytes with 2 bad bytes ok 6 - lead bytes with 2 bad bytes ok 7 - lead bytes with 2 bad bytes ok 8 - lead bytes with 1 bad byte inside the eight byte limit ok 9 - lead bytes with 1 bad bytes ok 10 - lead bytes with 2 sets of bad bytes ok 11 - lead bytes with 2 sets of bad bytes ok 12 - lead bytes with 2 sets of bad bytes ok 13 - lead bytes with 2 sets of bad bytes ok 14 - lead bytes with 2 sets of bad bytes

    You can plug in anybody's suggestion into the script and verify that it does, indeed, do what you want. If you get the hang of a little testing now you'll find it easier for other things too! Good luck!

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://576137]
Approved by davidrw
Front-paged by diotalevi
[ambrus]: "you are far more likely to be squished between two other dudes" => wait, really? aren't most planes, weighted by people traveling on it, so narrow that there are only two or three seats on each side of the one corridor, so only one third of the people ge
[ambrus]: t squished between two other people?
holli is about to take the first hit since a month and this weed is strong shit. Should my head explode, someone call 911.
[ambrus]: And even when I traveled on a big plane for a long flight, it had 3 seats, then corridor, then 4 seats, then corridor, then 3 seats, so less than half of the people sit between two other dudes.
[1nickt]: you are missing an entire element, which is that your base ticket price only gets you a middle seat. If you want to be window or aisle you must pay, or gamble that you'll be assigned there at ...
[1nickt]: ... flihght time (hoping that not enough others have paid and there are some "good" seats to be allotted.
[1nickt]: And this is in addition to paying extra if you want to be in the first 12 rows, or in the emergency exit row.
choroba was once seeted besides a guy that was so fat he was almost squished between the guy

How do I use this? | Other CB clients
Other Users?
Others musing on the Monastery: (12)
As of 2017-12-18 13:42 GMT
Find Nodes?
    Voting Booth?
    What programming language do you hate the most?

    Results (486 votes). Check out past polls.