Beefy Boxes and Bandwidth Generously Provided by pair Networks
There's more than one way to do things
 
PerlMonks  

Re^3: 5.10.0 regex slowdown

by shmem (Canon)
on Feb 21, 2008 at 12:55 UTC ( #669264=note: print w/ replies, xml ) Need Help??


in reply to Re^2: 5.10.0 regex slowdown
in thread 5.10.0 regex slowdown

There seems to be precious little further info on that var

Well, it's in in the perlvar manual page in my install ;-)

But it looks like one has to use English to see its default value:

qwurx [shmem] ~/perlmonks > perl5.10.0 -le 'print ${^RE_TRIE_MAXBUF}' qwurx [shmem] ~/perlmonks > perl5.10.0 -MEnglish -le 'print ${^RE_TRI +E_MAXBUF}' 65536

But yes, the number 2**16 rings a bell. It looks like the TRIE-optimization has that limit (of tokens? objects? branches?). Running with -D512 reveals

13104 probes:

EXECUTING... Trying 13104 probes with perl 5.010000 at 669148.pl line 18. 1203590260 at 669148.pl line 20. Compiling REx "TATGTTTCGT|CCGCTTTTTA|CGAAGATTTC|GAACGACGGC|TGTGTTTAAC| +CCTCA"... Final program: 1: TRIEC-EXACT[ACGT] (65526) <TATGTTTCGT> <CCGCTTTTTA> <CGAAGATTTC> ... <AACAGTGAGG> <GAAACTCGCG> <GAGAGATGGA> 65526: END (0) stclass AHOCORASICKC-EXACT[ACGT] minlen 10 Compiling REx "((?-xism:TATGTTTCGT|CCGCTTTTTA|CGAAGATTTC|GAACGACGGC|TG +TGTTT"... Final program: 1: OPEN1 (3) 3: TRIEC-EXACT[ACGT] (65529) <TATGTTTCGT> <CCGCTTTTTA> <CGAAGATTTC> ... <AACAGTGAGG> <GAAACTCGCG> <GAGAGATGGA> 65529: CLOSE1 (65531) 65531: END (0) stclass AHOCORASICKC-EXACT[ACGT] minlen 10 Matching REx "((?-xism:TATGTTTCGT|CCGCTTTTTA|CGAAGATTTC|GAACGACGGC|TGT +GTTT"... against "ACTCGAATTCCGAATAGATAGAAGTCTGCTGATAATATCGCGCCGGT TCTGATGCGCCTC"... Matching stclass AHOCORASICKC-EXACT[ACGT] against "ACTCGAATTCCGAATAGAT +AGAAGTCTGCTGATAATATCGCGCCGGTTCTGATGCGCCTC"... (1000000 chars) 0 <> <ACTCGAATTC> | Charid: 2 CP: 41 State: 1, word=0 +- legal 1 <A> <CTCGAATTCC> | Charid: 4 CP: 43 State: 52, word=0 +- legal 2 <AC> <TCGAATTCCG> | Charid: 1 CP: 54 State: ce, word=0 +- legal 3 <ACT> <CGAATTCCGA> | Charid: 4 CP: 43 State: 1a2, word=0 +- legal 4 <ACTC> <GAATTCCGAA> | Charid: 3 CP: 47 State: 1a3, word=0 +- legal

13105 probes:

EXECUTING... Trying 13105 probes with perl 5.010000 at 669148.pl line 18. 1203590218 at 669148.pl line 20. Compiling REx "TATGTTTCGT|CCGCTTTTTA|CGAAGATTTC|GAACGACGGC|TGTGTTTAAC| +CCTCA"... Final program: 1: TRIEC-EXACT[ACGT] (65531) <TATGTTTCGT> <CCGCTTTTTA> <CGAAGATTTC> ... <GAAACTCGCG> <GAGAGATGGA> <CGCCGAGGAT> 65531: END (0) stclass AHOCORASICKC-EXACT[ACGT] minlen 10 Compiling REx "((?-xism:TATGTTTCGT|CCGCTTTTTA|CGAAGATTTC|GAACGACGGC|TG +TGTTT"... Final program: 1: OPEN1 (3) 3: BRANCHJ (11) 5: EXACT <TATGTTTCGT> (9) 9: LONGJMP (104850) 11: BRANCHJ (19) 13: EXACT <CCGCTTTTTA> (17) 17: LONGJMP (104850) 19: BRANCHJ (27) 21: EXACT <CGAAGATTTC> (25) 25: LONGJMP (104850) ...

The 65531: END (0) looks - though I don't know at all what it means - just too close to 2**16 ...

--shmem

_($_=" "x(1<<5)."?\n".q·/)Oo.  G°\        /
                              /\_¯/(q    /
----------------------------  \__(m.====·.(_("always off the crowd"))."·
");sub _{s./.($e="'Itrs `mnsgdq Gdbj O`qkdq")=~y/"-y/#-z/;$e.e && print}


Comment on Re^3: 5.10.0 regex slowdown
Select or Download Code
Re^4: 5.10.0 regex slowdown
by BrowserUk (Pope) on Feb 21, 2008 at 15:00 UTC
      Apologies please reap error in testing.
Re^4: 5.10.0 regex slowdown
by demerphq (Chancellor) on Feb 22, 2008 at 00:56 UTC

    See those BRANCHJ EXACT LONGJMP tuples? They are confusing the optimiser so that it doesnt recognize this as a trie'able sequence. Ill have to think about what i can do about that.

    I doubt ill have time soon tho.

    Offline ysth mentioned a cool suggestion:

    [2008-02-22 01:45] <ysth1> I ended up dividing my list into three diff +erent regexes and used them like (?:(??{$rxa})|(??{$rxb})|(??{$rxc}))

    Which isnt ideal but better than no trie at all. Sorry about this. :-(

    ---
    $world=~s/war/peace/g

        Ah, yes. For sure. I didnt need that much anyway. :-) Im pretty sure youll find that most of that is going to be *match* debug output, which is not what i wanted, i just wanted a snippet of the compiled pattern dump. Sorry about that.

        ---
        $world=~s/war/peace/g

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://669264]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others rifling through the Monastery: (6)
As of 2015-07-04 22:41 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    The top three priorities of my open tasks are (in descending order of likelihood to be worked on) ...









    Results (60 votes), past polls