http://www.perlmonks.org?node_id=1018517


in reply to A regex that only matches at offset that are multiples of a given N?

Similar to tobyink's solution but doing a global match in case there are multiple "fred"s on your intervals. It seems to work except that it matches twice at each point, probably for similar reasons as explored here.

]$ perl -Mstrict -Mwarnings -E ' $_ = q{abcdefredfghfredijklmnopfredqrs}; for my $n ( 4, 5 ) { say qq{\$n = $n}; say qq{Matched $1 at position @{ [ pos( $_ ) ] }} while m{\G(?:.{$n})*?(?=(fred.*))}g; }' $n = 4 Matched fredijklmnopfredqrs at position 12 Matched fredijklmnopfredqrs at position 12 Matched fredqrs at position 24 Matched fredqrs at position 24 $n = 5 Matched fredfghfredijklmnopfredqrs at position 5 Matched fredfghfredijklmnopfredqrs at position 5 $

I hope this is useful.

Cheers,

JohnGG

  • Comment on Re: A regex that only matches at offset that are multiples of a given N?
  • Download Code

Replies are listed 'Best First'.
Re^2: A regex that only matches at offset that are multiples of a given N? (Update:almost perfect!)
by BrowserUk (Patriarch) on Feb 13, 2013 at 16:39 UTC

    Ah! Almost(*) perfect. (I never have wrapped my brain around \G :( )

    print "$-[0]: $1" while $a =~ m[\G(?:.{4})*?(?=(aa..))]g;; 0: aawx 404: aawx print "$-[0]: $1" while $a =~ m[\G(?:.{4})*?(?=(gg..))]g;; 0: gghn 208: gghn

    (*)I wasn't seeing the double matching; but now I am. Then I thought moving the \G would fix it, but it doesn't :( )


    With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.

      Maybe you could avoid the double matching, by manually incrementing pos() after each successful match - either inside the body of the while loop, or at the end of the regex using (?{pos() += 4}) ?

      (Just an idea, haven't tested it.)

        That's a possible approach that is effectively similar to simply ignoring non-aligned matched (the next if pos() % 4; stuff above), but the real saving would only be achieved by only calling out from the regex engine for aligned matches.

        It seems like it ought to be possible to reposition the \G somewhere after the (?:.{$n}) to avoid the duplicates. But I've tried various places including:

        #! perl -slw use strict; my $s = join '', map { ('a'..'z')[ rand 26 ] } 1 .. 1000; our $N //= 4; for my $key ( 'aa' .. 'zz' ) { print "$-[1] : $1" while $s =~ m[(?:.{$N})*(?=($key\G..))]g; }

        And I cannot make it work. (The above silently matches nothing; v-e-r-y s-l-o-w-l-y!).

        And that is pretty much the kind of effect I seem to get every time I try to use \G. I conclude that I have just never managed to acquire a good mental model of what it actually does; despite years of trying on and off.


        With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
        Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
        "Science is about questioning the status quo. Questioning authority".
        In the absence of evidence, opinion is indistinguishable from prejudice.