http://www.perlmonks.org?node_id=1018492


in reply to A regex that only matches at offset that are multiples of a given N?

I'd put . in place of ???? and then * after the first parentheses...

use v5.12; my @should_match = ( q[foo], q[WXYZfoo], q[WXYZWXYZfoo], q[WXYZ WXYZfoo], ); my @should_not_match = ( q[ foo], q[ABCfoo], q[VWXYZfoo], q[WXYZWXYZWXYZAfoo], ); my $regexp = qr{^(?:.{4})*foo}; say /$regexp/ ? "ok" : "not ok" for @should_match; say /$regexp/ ? "not ok" : "ok" for @should_not_match;
package Cow { use Moo; has name => (is => 'lazy', default => sub { 'Mooington' }) } say Cow->new->name

Replies are listed 'Best First'.
Re^2: A regex that only matches at offset that are multiples of a given N?
by BrowserUk (Patriarch) on Feb 13, 2013 at 16:24 UTC
    wish it were so simple:

    $a = join'',map{ ('a'..'z')[rand 26] } 1 .. 1000;; print "$-[0]: $1" while $a =~ m[(?:.{4})*(?=(aa..))]g;; 0: aawx 404: aawx 405: aadz 481: aadz print "$-[0]: $1" while $a =~ m[(?:.{4})*(?=(gg..))]g;; 0: gghn 208: gghn 211: ggyj 955: ggyj

    With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.

      Ah, you're using /g... in that case I'd just...

      use v5.12; my $a = join '', map{ ('a'..'j')[rand 10] } 1 .. 1000; while ($a =~ m[(aa..)]g) { next if pos($a) % 4; say "Match at ", pos($a), ": ", $1; }

      It's possible that (?(cond)yes-expr|no-expr) might be able to do what you want, but I've not had much luck with that.

      Update: with the hints about pos % 4 below, I've managed to get (?(cond)yes-expr|no-expr) to work. Not sure how well it goes performance-wise in practice, but:

      use v5.12; my $a = join '', map{ ('a'..'j')[rand 10] } 1 .. 1000;; while ($a =~ m[(?(?{ pos() % 4 })(*F)|(aa..))]g) { say "Match at ", pos($a), ": ", $1; }
      package Cow { use Moo; has name => (is => 'lazy', default => sub { 'Mooington' }) } say Cow->new->name
        Ah, you're using /g...

        Yes. Sorry about failing to mention that. It was late and I was tired {excuse, excuse, excuse...}

        in that case I'd just..

        That is essentially what I doing (except with index) in Re: Store larg hashes more efficiently (10e6 md5s in 260MB at 4µs per lookup), because I couldn't work out how to get a regex to do it for me. Hence the question.

        I'd rather push the check inside the regex engine if I could; and it seems like it should be a reasonable ask; but it has me totally stumped.

        Even johngg's solution is flawed. :(


        With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
        Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
        "Science is about questioning the status quo. Questioning authority".
        In the absence of evidence, opinion is indistinguishable from prejudice.
        Update: with the hints about pos % 4 below, I've managed to get (?(cond)yes-expr|no-expr) to work. Not sure how well it goes performance-wise in practice,

        Hm. I cannot make it work for my application; but on the basis of the failure I think my previous conclusion that putting code in regexes is always going to be dog slow:

        C:\test>junk39 -N=1 #! perl -slw use strict; use Digest::MD5 qw[ md5 ]; use Benchmark qw[ cmpthese ]; our $data = pack '(Va16)*', map{ $_, md5( $_ ) } 1 .. 1000; ## 1000 data items our $N //= -1; cmpthese $N, { a => q[ my $c = 0; for my $i ( 1 .. 2000 ) { ## half should pass; half fail. my $iBin = pack( 'V', $i ); my $md5 = md5( $i ); my $p = 0; while( $p = 1+index $data, $iBin, $p ) { next if ( $p - 1 ) % 20; ++$c, last if substr( $data, $p+3, 16 ) eq $md5; } } print "a: $c" if $N == 1; ], b => q[ my $c = 0; for my $i ( 1 .. 2000 ) { ## Finds 10?? my $iBin = pack( 'V', $i ); my $md5 = md5( $i ); while( $data =~ m{\G(?:.{20})*?(?=\Q$iBin\E(.{16}))}g ) { pos( $data ) += 20; ++$c, last if $1 eq $md5; } } print "b: $c" if $N == 1; ], c => q[ use re 'eval'; my $c = 0; for my $i ( 1 .. 2000 ) { ## finds none?? my $iBin = pack( 'V', $i ); my $md5 = md5( $i ); while( $data =~ m[(?(?{ pos( $data ) % 20 })(*F)|\Q%iBin\E +(.{16}))]g ) { ++$c, last if $1 eq $md5; } } print "c: $c" if $N == 1; ], }; __END__ C:\test>junk39 -N=1 a: 1000 b: 10 c: 0 Rate c a b c 0.111/s -- -99% -100% a 21.3/s 19117% -- -68% b 66.7/s 60113% 213% --

        I also can't get smls' version to work for this either?


        With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
        Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
        "Science is about questioning the status quo. Questioning authority".
        In the absence of evidence, opinion is indistinguishable from prejudice.
      Why have you removed the ^ anchor?
      لսႽ† ᥲᥒ⚪⟊Ⴙᘓᖇ Ꮅᘓᖇ⎱ Ⴙᥲ𝇋ƙᘓᖇ

        Because with it, it only matches the first match; not any second or subsequent properly aligned matches?:

        print "$-[0]: $1" while $a =~ m[(?:.{4})*(?=(aa..))]g;; 0: aawx 404: aawx 405: aadz 481: aadz print "$-[0]: $1" while $a =~ m[^(?:.{4})*(?=(aa..))]g;; 0: aawx print "$-[0]: $1" while $a =~ m[(?:.{4})*(?=(gg..))]g;; 0: gghn 208: gghn 211: ggyj 955: ggyj print "$-[0]: $1" while $a =~ m[^(?:.{4})*(?=(gg..))]g;; 0: gghn

        With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
        Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
        "Science is about questioning the status quo. Questioning authority".
        In the absence of evidence, opinion is indistinguishable from prejudice.
      You're just missing a leading \G