Beefy Boxes and Bandwidth Generously Provided by pair Networks
Welcome to the Monastery

Re^2: Multiple Regex's on a Big Sequence - Benchmark

by hv (Parson)
on Aug 17, 2006 at 11:21 UTC ( #567889=note: print w/replies, xml ) Need Help??

in reply to Re: Multiple Regex's on a Big Sequence - Benchmark
in thread Multiple Regex's on a Big Sequence

For the cases where you compare multiple regexps against your target string, it may save time if you also study($sequence) before starting the matches.

This will do a scan of the sequence to allow subsequent matches to use the Boyer-Moore algorithm - it builds a linked list of the locations of each different character in the sequence, and then takes advantage of the frequency data to pick the rarest character for which to walk the list.

Because the main benefit of this approach is about rarity, it may not be a big win for a case like this where the string uses only a 4-character alphabet, and (presumably) uses each character roughly 1/4 of the time; I'd be interested to see how it affects the benchmarks.


Replies are listed 'Best First'.
Re^3: Multiple Regex's on a Big Sequence - Benchmark
by bernanke01 (Beadle) on Aug 18, 2006 at 02:02 UTC
    Great idea, I'll add it to the next round of Benchmarks.

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://567889]
and all is quiet...

How do I use this? | Other CB clients
Other Users?
Others chilling in the Monastery: (3)
As of 2018-06-23 22:49 GMT
Find Nodes?
    Voting Booth?
    Should cpanminus be part of the standard Perl release?

    Results (126 votes). Check out past polls.