Beefy Boxes and Bandwidth Generously Provided by pair Networks
laziness, impatience, and hubris
 
PerlMonks  

Re^2: Multiple Regex's on a Big Sequence - Benchmark

by hv (Parson)
on Aug 17, 2006 at 11:21 UTC ( #567889=note: print w/ replies, xml ) Need Help??


in reply to Re: Multiple Regex's on a Big Sequence - Benchmark
in thread Multiple Regex's on a Big Sequence

For the cases where you compare multiple regexps against your target string, it may save time if you also study($sequence) before starting the matches.

This will do a scan of the sequence to allow subsequent matches to use the Boyer-Moore algorithm - it builds a linked list of the locations of each different character in the sequence, and then takes advantage of the frequency data to pick the rarest character for which to walk the list.

Because the main benefit of this approach is about rarity, it may not be a big win for a case like this where the string uses only a 4-character alphabet, and (presumably) uses each character roughly 1/4 of the time; I'd be interested to see how it affects the benchmarks.

Hugo


Comment on Re^2: Multiple Regex's on a Big Sequence - Benchmark
Download Code
Re^3: Multiple Regex's on a Big Sequence - Benchmark
by bernanke01 (Beadle) on Aug 18, 2006 at 02:02 UTC
    Great idea, I'll add it to the next round of Benchmarks.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://567889]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others chanting in the Monastery: (14)
As of 2014-09-16 19:55 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    How do you remember the number of days in each month?











    Results (46 votes), past polls