Beefy Boxes and Bandwidth Generously Provided by pair Networks
laziness, impatience, and hubris
 
PerlMonks  

Re^3: 5.10.0 regex slowdown

by shmem (Canon)
on Feb 21, 2008 at 12:55 UTC ( #669264=note: print w/ replies, xml ) Need Help??


in reply to Re^2: 5.10.0 regex slowdown
in thread 5.10.0 regex slowdown

There seems to be precious little further info on that var

Well, it's in in the perlvar manual page in my install ;-)

But it looks like one has to use English to see its default value:

qwurx [shmem] ~/perlmonks > perl5.10.0 -le 'print ${^RE_TRIE_MAXBUF}' qwurx [shmem] ~/perlmonks > perl5.10.0 -MEnglish -le 'print ${^RE_TRI +E_MAXBUF}' 65536

But yes, the number 2**16 rings a bell. It looks like the TRIE-optimization has that limit (of tokens? objects? branches?). Running with -D512 reveals

13104 probes:

EXECUTING... Trying 13104 probes with perl 5.010000 at 669148.pl line 18. 1203590260 at 669148.pl line 20. Compiling REx "TATGTTTCGT|CCGCTTTTTA|CGAAGATTTC|GAACGACGGC|TGTGTTTAAC| +CCTCA"... Final program: 1: TRIEC-EXACT[ACGT] (65526) <TATGTTTCGT> <CCGCTTTTTA> <CGAAGATTTC> ... <AACAGTGAGG> <GAAACTCGCG> <GAGAGATGGA> 65526: END (0) stclass AHOCORASICKC-EXACT[ACGT] minlen 10 Compiling REx "((?-xism:TATGTTTCGT|CCGCTTTTTA|CGAAGATTTC|GAACGACGGC|TG +TGTTT"... Final program: 1: OPEN1 (3) 3: TRIEC-EXACT[ACGT] (65529) <TATGTTTCGT> <CCGCTTTTTA> <CGAAGATTTC> ... <AACAGTGAGG> <GAAACTCGCG> <GAGAGATGGA> 65529: CLOSE1 (65531) 65531: END (0) stclass AHOCORASICKC-EXACT[ACGT] minlen 10 Matching REx "((?-xism:TATGTTTCGT|CCGCTTTTTA|CGAAGATTTC|GAACGACGGC|TGT +GTTT"... against "ACTCGAATTCCGAATAGATAGAAGTCTGCTGATAATATCGCGCCGGT TCTGATGCGCCTC"... Matching stclass AHOCORASICKC-EXACT[ACGT] against "ACTCGAATTCCGAATAGAT +AGAAGTCTGCTGATAATATCGCGCCGGTTCTGATGCGCCTC"... (1000000 chars) 0 <> <ACTCGAATTC> | Charid: 2 CP: 41 State: 1, word=0 +- legal 1 <A> <CTCGAATTCC> | Charid: 4 CP: 43 State: 52, word=0 +- legal 2 <AC> <TCGAATTCCG> | Charid: 1 CP: 54 State: ce, word=0 +- legal 3 <ACT> <CGAATTCCGA> | Charid: 4 CP: 43 State: 1a2, word=0 +- legal 4 <ACTC> <GAATTCCGAA> | Charid: 3 CP: 47 State: 1a3, word=0 +- legal

13105 probes:

EXECUTING... Trying 13105 probes with perl 5.010000 at 669148.pl line 18. 1203590218 at 669148.pl line 20. Compiling REx "TATGTTTCGT|CCGCTTTTTA|CGAAGATTTC|GAACGACGGC|TGTGTTTAAC| +CCTCA"... Final program: 1: TRIEC-EXACT[ACGT] (65531) <TATGTTTCGT> <CCGCTTTTTA> <CGAAGATTTC> ... <GAAACTCGCG> <GAGAGATGGA> <CGCCGAGGAT> 65531: END (0) stclass AHOCORASICKC-EXACT[ACGT] minlen 10 Compiling REx "((?-xism:TATGTTTCGT|CCGCTTTTTA|CGAAGATTTC|GAACGACGGC|TG +TGTTT"... Final program: 1: OPEN1 (3) 3: BRANCHJ (11) 5: EXACT <TATGTTTCGT> (9) 9: LONGJMP (104850) 11: BRANCHJ (19) 13: EXACT <CCGCTTTTTA> (17) 17: LONGJMP (104850) 19: BRANCHJ (27) 21: EXACT <CGAAGATTTC> (25) 25: LONGJMP (104850) ...

The 65531: END (0) looks - though I don't know at all what it means - just too close to 2**16 ...

--shmem

_($_=" "x(1<<5)."?\n".q·/)Oo.  G°\        /
                              /\_¯/(q    /
----------------------------  \__(m.====·.(_("always off the crowd"))."·
");sub _{s./.($e="'Itrs `mnsgdq Gdbj O`qkdq")=~y/"-y/#-z/;$e.e && print}


Comment on Re^3: 5.10.0 regex slowdown
Select or Download Code
Re^4: 5.10.0 regex slowdown
by BrowserUk (Pope) on Feb 21, 2008 at 15:00 UTC
      Apologies please reap error in testing.
Re^4: 5.10.0 regex slowdown
by demerphq (Chancellor) on Feb 22, 2008 at 00:56 UTC

    See those BRANCHJ EXACT LONGJMP tuples? They are confusing the optimiser so that it doesnt recognize this as a trie'able sequence. Ill have to think about what i can do about that.

    I doubt ill have time soon tho.

    Offline ysth mentioned a cool suggestion:

    [2008-02-22 01:45] <ysth1> I ended up dividing my list into three diff +erent regexes and used them like (?:(??{$rxa})|(??{$rxb})|(??{$rxc}))

    Which isnt ideal but better than no trie at all. Sorry about this. :-(

    ---
    $world=~s/war/peace/g

        Ah, yes. For sure. I didnt need that much anyway. :-) Im pretty sure youll find that most of that is going to be *match* debug output, which is not what i wanted, i just wanted a snippet of the compiled pattern dump. Sorry about that.

        ---
        $world=~s/war/peace/g

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://669264]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others drinking their drinks and smoking their pipes about the Monastery: (11)
As of 2014-07-25 19:13 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    My favorite superfluous repetitious redundant duplicative phrase is:









    Results (174 votes), past polls