http://www.perlmonks.org?node_id=1018580


in reply to Re^2: A regex that only matches at offset that are multiples of a given N?
in thread A regex that only matches at offset that are multiples of a given N?

Ah, you're using /g... in that case I'd just...

use v5.12; my $a = join '', map{ ('a'..'j')[rand 10] } 1 .. 1000; while ($a =~ m[(aa..)]g) { next if pos($a) % 4; say "Match at ", pos($a), ": ", $1; }

It's possible that (?(cond)yes-expr|no-expr) might be able to do what you want, but I've not had much luck with that.

Update: with the hints about pos % 4 below, I've managed to get (?(cond)yes-expr|no-expr) to work. Not sure how well it goes performance-wise in practice, but:

use v5.12; my $a = join '', map{ ('a'..'j')[rand 10] } 1 .. 1000;; while ($a =~ m[(?(?{ pos() % 4 })(*F)|(aa..))]g) { say "Match at ", pos($a), ": ", $1; }
package Cow { use Moo; has name => (is => 'lazy', default => sub { 'Mooington' }) } say Cow->new->name

Replies are listed 'Best First'.
Re^4: A regex that only matches at offset that are multiples of a given N?
by BrowserUk (Patriarch) on Feb 13, 2013 at 17:18 UTC
    Ah, you're using /g...

    Yes. Sorry about failing to mention that. It was late and I was tired {excuse, excuse, excuse...}

    in that case I'd just..

    That is essentially what I doing (except with index) in Re: Store larg hashes more efficiently (10e6 md5s in 260MB at 4µs per lookup), because I couldn't work out how to get a regex to do it for me. Hence the question.

    I'd rather push the check inside the regex engine if I could; and it seems like it should be a reasonable ask; but it has me totally stumped.

    Even johngg's solution is flawed. :(


    With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.
Re^4: A regex that only matches at offset that are multiples of a given N?
by BrowserUk (Patriarch) on Feb 16, 2013 at 00:29 UTC
    Update: with the hints about pos % 4 below, I've managed to get (?(cond)yes-expr|no-expr) to work. Not sure how well it goes performance-wise in practice,

    Hm. I cannot make it work for my application; but on the basis of the failure I think my previous conclusion that putting code in regexes is always going to be dog slow:

    C:\test>junk39 -N=1 #! perl -slw use strict; use Digest::MD5 qw[ md5 ]; use Benchmark qw[ cmpthese ]; our $data = pack '(Va16)*', map{ $_, md5( $_ ) } 1 .. 1000; ## 1000 data items our $N //= -1; cmpthese $N, { a => q[ my $c = 0; for my $i ( 1 .. 2000 ) { ## half should pass; half fail. my $iBin = pack( 'V', $i ); my $md5 = md5( $i ); my $p = 0; while( $p = 1+index $data, $iBin, $p ) { next if ( $p - 1 ) % 20; ++$c, last if substr( $data, $p+3, 16 ) eq $md5; } } print "a: $c" if $N == 1; ], b => q[ my $c = 0; for my $i ( 1 .. 2000 ) { ## Finds 10?? my $iBin = pack( 'V', $i ); my $md5 = md5( $i ); while( $data =~ m{\G(?:.{20})*?(?=\Q$iBin\E(.{16}))}g ) { pos( $data ) += 20; ++$c, last if $1 eq $md5; } } print "b: $c" if $N == 1; ], c => q[ use re 'eval'; my $c = 0; for my $i ( 1 .. 2000 ) { ## finds none?? my $iBin = pack( 'V', $i ); my $md5 = md5( $i ); while( $data =~ m[(?(?{ pos( $data ) % 20 })(*F)|\Q%iBin\E +(.{16}))]g ) { ++$c, last if $1 eq $md5; } } print "c: $c" if $N == 1; ], }; __END__ C:\test>junk39 -N=1 a: 1000 b: 10 c: 0 Rate c a b c 0.111/s -- -99% -100% a 21.3/s 19117% -- -68% b 66.7/s 60113% 213% --

    I also can't get smls' version to work for this either?


    With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.