Beefy Boxes and Bandwidth Generously Provided by pair Networks
good chemistry is complicated,
and a little bit messy -LW
 
PerlMonks  

Re^4: suffix array efficiency

by oiskuu (Friar)
on Jan 20, 2014 at 19:50 UTC ( #1071374=note: print w/ replies, xml ) Need Help??


in reply to Re^3: suffix array efficiency
in thread suffix array efficiency

The degenerate (1-letter) case:

20000 1 -5 kennethk 0.154/s RobertCraven 7.55/s xxx 23.0/s hdb 28.5/s
The 4-letter case packs only 8 bits of data into 32-bit values and therefore performs poorly. With genomic data one would of course pack differently.

Much more importantly: is there no XS module for constructing suffix arrays? Seems like a very worthy problem.


Comment on Re^4: suffix array efficiency
Download Code
Re^5: suffix array efficiency
by kennethk (Monsignor) on Jan 20, 2014 at 20:16 UTC
    I'm not a bioinformatics guy, but if massive runs are actually a significant concern in this system, it's pretty easy to fix the worst-case behavior by adaptively boosting the window, so that worst case regresses to hdb's solution:
    sub kennethk { my $string = shift; my $off = 0; my $step = 10; my @indices = sort {$off = 0; until (substr($string, $a+$off, $step) cmp sub +str($string, $b+$off, $step)){ $off += $step; $step *= 2; } + } 0 .. length($string) - 1; $_++ for @indices; return @indices; }
    which gives a degenerate-case comparison of
    RobertCraven 2.90/s kennethk 12.8/s xxx 13.1/s hdb 14.2/s
    and comparable behavior in the rest of the benchmarks.

    I also note your computer is twice as fast as mine. Wanna trade?


    #11929 First ask yourself `How would I do this without a computer?' Then have the computer do it the same way.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://1071374]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others making s'mores by the fire in the courtyard of the Monastery: (6)
As of 2014-11-26 06:59 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    My preferred Perl binaries come from:














    Results (163 votes), past polls