Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl Monk, Perl Meditation
 
PerlMonks  

How to match more than 32766 times in regex?

by rsFalse (Chaplain)
on Dec 01, 2015 at 18:21 UTC ( [id://1149052]=perlquestion: print w/replies, xml ) Need Help??

rsFalse has asked for the wisdom of the Perl Monks concerning the following question:

This node falls below the community's threshold of quality. You may see it by logging in.

Replies are listed 'Best First'.
Match twice
by choroba (Cardinal) on Dec 01, 2015 at 18:23 UTC
    Update: The whole question is in the title. So does the answer.
    ($q=q:Sq=~/;[c](.)(.)/;chr(-||-|5+lengthSq)`"S|oS2"`map{chr |+ord }map{substrSq`S_+|`|}3E|-|`7**2-3:)=~y+S|`+$1,++print+eval$q,q,a,
Re: How to match more than 32766 times in regex?
by BrowserUk (Patriarch) on Dec 01, 2015 at 18:24 UTC

    Before I've even read your question; my suggestion is that you take up Python.

    Going through a bunch of known limitations, and raising questions about them as if you've newly discovered them, is a sad strategy.


    With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority". I knew I was on the right track :)
    In the absence of evidence, opinion is indistinguishable from prejudice.
      I've read about limitation, but is it a way to compose regex without big time penalty? I tried smth like /$regex*$regex*$regex*/ if I wanted match up to 96000 times, but it takes a lot of time regex to finish.

        To make a regexp faster, search from start or end, using ^ or $ to bind it to that point. But can you explain what you want to do? I am sure there is a better way.


        As for code, look at the multiplier x 3 that concatenates the string 3 times. Then, we use qr to quote a regular expression, which we then use and capture the results in @R, which we then print. Hope this gets you ideas. (duplicating the expression to capture it 2 times)

        $ perl -e '$s="(\\d\\w)" x 3; $X="a1b2c3d4e5"; $m=qr/$s/; @R=$X=~$m; +print join(";",@R)."\n"' 1b;2c;3d

        another way could be divide and conquer. Paying a penalty by using $' (the rest of the string that has not matched yet) for the next iteration. another idea is using index

        if I wanted match up to 96000 times,

        What's your application?


        With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
        Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
        "Science is about questioning the status quo. Questioning authority". I knew I was on the right track :)
        In the absence of evidence, opinion is indistinguishable from prejudice.
        "... if I wanted match up to 96000 times ..."

        You're fired.

Re: How to match more than 32766 times in regex?
by Anonymous Monk on Dec 01, 2015 at 18:54 UTC
    (shrug) Admittedly at this point BrowserUK's suggestion makes sense to me. But, anyway... use a non-backtracking engine. Or change REG_INFTY value in regcomp.h and recompile perl (I have no idea whether it will work or not).
      use strict; use warnings; my $X = "a1b2c3d4e5"; # or use File::Slurp my $s = "(\\w\\d)"; # my pattern match $s my $m = qr/$s/; # compiled to a regular expression $m my $counter = 0; while($X=~s/$m//){ ++$counter; next unless $counter > 32766; # wait for it... print "this is the $counter iteration, got $1 \n"; }

        No need to go to those lengths:

        $s = '0123456789' x 100000;; ( $m ) = $s =~ m[((?:(?:0123456789){32000}){3})];; print length $m;; 960000

        But for any given application there's almost certainly a better way of tackling the problem.


        With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
        Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
        "Science is about questioning the status quo. Questioning authority". I knew I was on the right track :)
        In the absence of evidence, opinion is indistinguishable from prejudice.
        Hmmm, I thought the OP had problems with 'complex regex recursion limit exceeded'. If he just wanted to match something like (\w\d){32767}, sure.
Re: How to match more than 32766 times in regex?
by rsFalse (Chaplain) on Nov 01, 2018 at 11:32 UTC
    perlre: "This is usually 32766 on the most common platforms"

    What do you think about the need to expand the perlre section about quantifiers with the suggestion how to handy overcome '+' and '*' limitation and make the equivalent regex which matches {0,infty}? The regex should be readable and not slow in performance.
      Feel free to send a perlbug with the patch to the documentation.

      ($q=q:Sq=~/;[c](.)(.)/;chr(-||-|5+lengthSq)`"S|oS2"`map{chr |+ord }map{substrSq`S_+|`|}3E|-|`7**2-3:)=~y+S|`+$1,++print+eval$q,q,a,

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://1149052]
Approved by Corion
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others having a coffee break in the Monastery: (3)
As of 2024-03-29 07:22 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found