Beefy Boxes and Bandwidth Generously Provided by pair Networks
more useful options
 
PerlMonks  

Re: A regex that only matches at offset that are multiples of a given N?

by tobyink (Abbot)
on Feb 13, 2013 at 07:52 UTC ( #1018492=note: print w/ replies, xml ) Need Help??


in reply to A regex that only matches at offset that are multiples of a given N?

I'd put . in place of ???? and then * after the first parentheses...

use v5.12; my @should_match = ( q[foo], q[WXYZfoo], q[WXYZWXYZfoo], q[WXYZ WXYZfoo], ); my @should_not_match = ( q[ foo], q[ABCfoo], q[VWXYZfoo], q[WXYZWXYZWXYZAfoo], ); my $regexp = qr{^(?:.{4})*foo}; say /$regexp/ ? "ok" : "not ok" for @should_match; say /$regexp/ ? "not ok" : "ok" for @should_not_match;
package Cow { use Moo; has name => (is => 'lazy', default => sub { 'Mooington' }) } say Cow->new->name


Comment on Re: A regex that only matches at offset that are multiples of a given N?
Select or Download Code
Re^2: A regex that only matches at offset that are multiples of a given N?
by BrowserUk (Pope) on Feb 13, 2013 at 16:24 UTC
    wish it were so simple:

    $a = join'',map{ ('a'..'z')[rand 26] } 1 .. 1000;; print "$-[0]: $1" while $a =~ m[(?:.{4})*(?=(aa..))]g;; 0: aawx 404: aawx 405: aadz 481: aadz print "$-[0]: $1" while $a =~ m[(?:.{4})*(?=(gg..))]g;; 0: gghn 208: gghn 211: ggyj 955: ggyj

    With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.
      Why have you removed the ^ anchor?
      لսႽ ᥲᥒ⚪⟊Ⴙᘓᖇ Ꮅᘓᖇ⎱ Ⴙᥲ𝇋ƙᘓᖇ

        Because with it, it only matches the first match; not any second or subsequent properly aligned matches?:

        print "$-[0]: $1" while $a =~ m[(?:.{4})*(?=(aa..))]g;; 0: aawx 404: aawx 405: aadz 481: aadz print "$-[0]: $1" while $a =~ m[^(?:.{4})*(?=(aa..))]g;; 0: aawx print "$-[0]: $1" while $a =~ m[(?:.{4})*(?=(gg..))]g;; 0: gghn 208: gghn 211: ggyj 955: ggyj print "$-[0]: $1" while $a =~ m[^(?:.{4})*(?=(gg..))]g;; 0: gghn

        With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
        Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
        "Science is about questioning the status quo. Questioning authority".
        In the absence of evidence, opinion is indistinguishable from prejudice.

      Ah, you're using /g... in that case I'd just...

      use v5.12; my $a = join '', map{ ('a'..'j')[rand 10] } 1 .. 1000; while ($a =~ m[(aa..)]g) { next if pos($a) % 4; say "Match at ", pos($a), ": ", $1; }

      It's possible that (?(cond)yes-expr|no-expr) might be able to do what you want, but I've not had much luck with that.

      Update: with the hints about pos % 4 below, I've managed to get (?(cond)yes-expr|no-expr) to work. Not sure how well it goes performance-wise in practice, but:

      use v5.12; my $a = join '', map{ ('a'..'j')[rand 10] } 1 .. 1000;; while ($a =~ m[(?(?{ pos() % 4 })(*F)|(aa..))]g) { say "Match at ", pos($a), ": ", $1; }
      package Cow { use Moo; has name => (is => 'lazy', default => sub { 'Mooington' }) } say Cow->new->name
        Ah, you're using /g...

        Yes. Sorry about failing to mention that. It was late and I was tired {excuse, excuse, excuse...}

        in that case I'd just..

        That is essentially what I doing (except with index) in Re: Store larg hashes more efficiently (10e6 md5s in 260MB at 4s per lookup), because I couldn't work out how to get a regex to do it for me. Hence the question.

        I'd rather push the check inside the regex engine if I could; and it seems like it should be a reasonable ask; but it has me totally stumped.

        Even johngg's solution is flawed. :(


        With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
        Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
        "Science is about questioning the status quo. Questioning authority".
        In the absence of evidence, opinion is indistinguishable from prejudice.
        Update: with the hints about pos % 4 below, I've managed to get (?(cond)yes-expr|no-expr) to work. Not sure how well it goes performance-wise in practice,

        Hm. I cannot make it work for my application; but on the basis of the failure I think my previous conclusion that putting code in regexes is always going to be dog slow:

        C:\test>junk39 -N=1 #! perl -slw use strict; use Digest::MD5 qw[ md5 ]; use Benchmark qw[ cmpthese ]; our $data = pack '(Va16)*', map{ $_, md5( $_ ) } 1 .. 1000; ## 1000 data items our $N //= -1; cmpthese $N, { a => q[ my $c = 0; for my $i ( 1 .. 2000 ) { ## half should pass; half fail. my $iBin = pack( 'V', $i ); my $md5 = md5( $i ); my $p = 0; while( $p = 1+index $data, $iBin, $p ) { next if ( $p - 1 ) % 20; ++$c, last if substr( $data, $p+3, 16 ) eq $md5; } } print "a: $c" if $N == 1; ], b => q[ my $c = 0; for my $i ( 1 .. 2000 ) { ## Finds 10?? my $iBin = pack( 'V', $i ); my $md5 = md5( $i ); while( $data =~ m{\G(?:.{20})*?(?=\Q$iBin\E(.{16}))}g ) { pos( $data ) += 20; ++$c, last if $1 eq $md5; } } print "b: $c" if $N == 1; ], c => q[ use re 'eval'; my $c = 0; for my $i ( 1 .. 2000 ) { ## finds none?? my $iBin = pack( 'V', $i ); my $md5 = md5( $i ); while( $data =~ m[(?(?{ pos( $data ) % 20 })(*F)|\Q%iBin\E +(.{16}))]g ) { ++$c, last if $1 eq $md5; } } print "c: $c" if $N == 1; ], }; __END__ C:\test>junk39 -N=1 a: 1000 b: 10 c: 0 Rate c a b c 0.111/s -- -99% -100% a 21.3/s 19117% -- -68% b 66.7/s 60113% 213% --

        I also can't get smls' version to work for this either?


        With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
        Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
        "Science is about questioning the status quo. Questioning authority".
        In the absence of evidence, opinion is indistinguishable from prejudice.
      You're just missing a leading \G

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://1018492]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others chanting in the Monastery: (6)
As of 2015-07-07 06:50 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    The top three priorities of my open tasks are (in descending order of likelihood to be worked on) ...









    Results (87 votes), past polls