Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl-Sensitive Sunglasses
 
PerlMonks  

How to match more than 32766 times in regex?

by rsFalse (Pilgrim)
on Dec 01, 2015 at 18:21 UTC ( #1149052=perlquestion: print w/replies, xml ) Need Help??
rsFalse has asked for the wisdom of the Perl Monks concerning the following question:

upd: Thanks for answers below.
upd: Full problem was posted by me later :/ in this reply - Re^5: How to match more than 32766 times in regex?
  • Comment on How to match more than 32766 times in regex?

Replies are listed 'Best First'.
Match twice
by choroba (Bishop) on Dec 01, 2015 at 18:23 UTC
    Update: The whole question is in the title. So does the answer.
    ($q=q:Sq=~/;[c](.)(.)/;chr(-||-|5+lengthSq)`"S|oS2"`map{chr |+ord }map{substrSq`S_+|`|}3E|-|`7**2-3:)=~y+S|`+$1,++print+eval$q,q,a,
Re: How to match more than 32766 times in regex?
by BrowserUk (Pope) on Dec 01, 2015 at 18:24 UTC

    Before I've even read your question; my suggestion is that you take up Python.

    Going through a bunch of known limitations, and raising questions about them as if you've newly discovered them, is a sad strategy.


    With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority". I knew I was on the right track :)
    In the absence of evidence, opinion is indistinguishable from prejudice.
      I've read about limitation, but is it a way to compose regex without big time penalty? I tried smth like /$regex*$regex*$regex*/ if I wanted match up to 96000 times, but it takes a lot of time regex to finish.

        To make a regexp faster, search from start or end, using ^ or $ to bind it to that point. But can you explain what you want to do? I am sure there is a better way.


        As for code, look at the multiplier x 3 that concatenates the string 3 times. Then, we use qr to quote a regular expression, which we then use and capture the results in @R, which we then print. Hope this gets you ideas. (duplicating the expression to capture it 2 times)

        $ perl -e '$s="(\\d\\w)" x 3; $X="a1b2c3d4e5"; $m=qr/$s/; @R=$X=~$m; +print join(";",@R)."\n"' 1b;2c;3d

        another way could be divide and conquer. Paying a penalty by using $' (the rest of the string that has not matched yet) for the next iteration. another idea is using index

        if I wanted match up to 96000 times,

        What's your application?


        With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
        Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
        "Science is about questioning the status quo. Questioning authority". I knew I was on the right track :)
        In the absence of evidence, opinion is indistinguishable from prejudice.
        "... if I wanted match up to 96000 times ..."

        You're fired.

Re: How to match more than 32766 times in regex?
by Anonymous Monk on Dec 01, 2015 at 18:54 UTC
    (shrug) Admittedly at this point BrowserUK's suggestion makes sense to me. But, anyway... use a non-backtracking engine. Or change REG_INFTY value in regcomp.h and recompile perl (I have no idea whether it will work or not).
      use strict; use warnings; my $X = "a1b2c3d4e5"; # or use File::Slurp my $s = "(\\w\\d)"; # my pattern match $s my $m = qr/$s/; # compiled to a regular expression $m my $counter = 0; while($X=~s/$m//){ ++$counter; next unless $counter > 32766; # wait for it... print "this is the $counter iteration, got $1 \n"; }

        No need to go to those lengths:

        $s = '0123456789' x 100000;; ( $m ) = $s =~ m[((?:(?:0123456789){32000}){3})];; print length $m;; 960000

        But for any given application there's almost certainly a better way of tackling the problem.


        With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
        Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
        "Science is about questioning the status quo. Questioning authority". I knew I was on the right track :)
        In the absence of evidence, opinion is indistinguishable from prejudice.
        Hmmm, I thought the OP had problems with 'complex regex recursion limit exceeded'. If he just wanted to match something like (\w\d){32767}, sure.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://1149052]
Approved by Corion
help
Chatterbox?
and all is quiet...

How do I use this? | Other CB clients
Other Users?
Others exploiting the Monastery: (7)
As of 2018-08-15 16:30 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?
    Asked to put a square peg in a round hole, I would:









    Results (161 votes). Check out past polls.

    Notices?