Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl: the Markov chain saw
 
PerlMonks  

Re: suffix array efficiency

by johngg (Canon)
on Jan 20, 2014 at 22:57 UTC ( [id://1071392]=note: print w/replies, xml ) Need Help??


in reply to suffix array efficiency

Adapting code from this thread gives this ST sort.

use strict; use warnings; use 5.010; my @words = qw{ hypersensitivities zoology chlorofluorocarbons reinterpretations mississipi }; foreach my $word ( @words ) { say qq{\n** $word **}; printf qq{%2d - %s\n}, $_, substr $word, $_ - 1 for sortPosns( $word ); } sub sortPosns { return map { $_->[ 0 ] } sort { $a->[ 1 ] cmp $b->[ 1 ] } map { [ $_, substr $_[ 0 ], $_ - 1 ] } 1 .. length $_[ 0 ]; }

The output.

** hypersensitivities ** 7 - ensitivities 4 - ersensitivities 17 - es 1 - hypersensitivities 16 - ies 14 - ities 10 - itivities 12 - ivities 8 - nsitivities 3 - persensitivities 5 - rsensitivities 18 - s 6 - sensitivities 9 - sitivities 15 - ties 11 - tivities 13 - vities 2 - ypersensitivities ** zoology ** 6 - gy 4 - logy 5 - ogy 3 - ology 2 - oology 7 - y 1 - zoology ** chlorofluorocarbons ** 14 - arbons 16 - bons 13 - carbons 1 - chlorofluorocarbons 7 - fluorocarbons 2 - hlorofluorocarbons 3 - lorofluorocarbons 8 - luorocarbons 18 - ns 12 - ocarbons 6 - ofluorocarbons 17 - ons 10 - orocarbons 4 - orofluorocarbons 15 - rbons 11 - rocarbons 5 - rofluorocarbons 19 - s 9 - uorocarbons ** reinterpretations ** 12 - ations 2 - einterpretations 6 - erpretations 10 - etations 3 - interpretations 14 - ions 16 - ns 4 - nterpretations 15 - ons 8 - pretations 1 - reinterpretations 9 - retations 7 - rpretations 17 - s 11 - tations 5 - terpretations 13 - tions ** mississipi ** 10 - i 8 - ipi 5 - issipi 2 - ississipi 1 - mississipi 9 - pi 7 - sipi 4 - sissipi 6 - ssipi 3 - ssissipi

Weirdly satisfying throwing odd words at it! I've no idea how it might perform in the benchmarks but I thought I'd post in case it is of interest.

Cheers,

JohnGG

Replies are listed 'Best First'.
Re^2: suffix array efficiency
by kennethk (Abbot) on Jan 21, 2014 at 15:20 UTC
    Popped it in the benchmark, just for fun. Suffers from Out of memory! at large strings (like RobertCraven's original), but fastest of the full string comparisons:
    20000 26 20 Rate hdb RobertCraven sortPosns kennethk + xxx hdb 2.73/s -- -15% -41% -55% + -84% RobertCraven 3.20/s 17% -- -31% -48% + -81% sortPosns 4.61/s 69% 44% -- -24% + -73% kennethk 6.11/s 124% 91% 32% -- + -64% xxx 17.1/s 527% 435% 271% 180% + -- 20000 4 20 Rate hdb RobertCraven sortPosns xxx + kennethk hdb 2.95/s -- -10% -36% -43% + -52% RobertCraven 3.29/s 11% -- -29% -36% + -46% sortPosns 4.63/s 57% 41% -- -10% + -25% xxx 5.15/s 74% 57% 11% -- + -16% kennethk 6.13/s 108% 87% 33% 19% + --

    #11929 First ask yourself `How would I do this without a computer?' Then have the computer do it the same way.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://1071392]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others lurking in the Monastery: (3)
As of 2024-04-18 22:55 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found