Beefy Boxes and Bandwidth Generously Provided by pair Networks
Think about Loose Coupling
 
PerlMonks  

Re^2: Finding repeat sequences. (Results:Part 2. The winner)

by BrowserUk (Pope)
on Jun 19, 2013 at 18:06 UTC ( #1039816=note: print w/ replies, xml ) Need Help??


in reply to Re: Finding repeat sequences. (Results:Part 1)
in thread Finding repeat sequences.

After exclusions on functionality, the three left standing -- DamianConway, tobyink & AnomalousMonk -- went forward to performance testing where the latter quickly fell by the wayside.

Of the remaining two, tobyink's solution is hands down winner with a cumulative 66 seconds version DamianConway's 1670 seconds:

... my @bases = ( 'fred', join( '', 'a'..'z' ), unpack( 'b*', pack 'Q*', 0 .. 99 ), ); my %res; for my $base ( @bases ) { for my $reps ( 1, 10, 100, 1000 ) { my $str = $base x $reps . substr( $base, 0, rand( length $base + ) ); for my $test ( keys %tests ) { my $start = time; my $res; eval { local $SIG{ALRM} = sub { die "timeout\n" }; alarm $T * $reps; $res = $tests{ $test }->( \$str ) // 'none found'; alarm 0; }; if( $@ eq "timeout\n" ) { delete $tests{ $test }; warn "$test timed out [@{[ $T * $reps ]}]; excluded\n" +; next; } my $stop = time; warn "$test: Found '$res' instead of '$base'\n" unless $re +s eq $base; $res{ $test }{ length( $base ) }{ $reps } = $stop - $start +; $res{ $test }{ length( $base ) }{ all } += $stop - $start; $res{ $test }{ all } += $stop - $start; printf "b:%5u in s:%10u %10s :: %f s\n", length( $base ), length( $str ), $test, $stop - $start ; } } } pp \%res; __END__ [18:23:33.12] C:\test>1039630-b.pl14 -SHOW=0 -T=10 tye found 'fredfrefred'; excluded from further consideration svc4 found 'redfrefredf'; excluded from further consideration hdb found 'none found'; excluded from further consideration choroba found 'fredfrefredfre'; excluded from further consideration Eily found 'fredfrefred'; excluded from further consideration Partisipants in performance tests: anomalous tobyink damianc anomalous timed out [10]; excluded { anomalous => { 4 => { 1 => "0.000113964080810547", 10 => "0.000288963317871094 +", 100 => "0.00277900695800781", 1000 => "0.0275580883026123", all => "0.0307400226593018", }, 26 => { 1 => "0.0011751651763916", 10 => "0.00826501846313477", +100 => "0.118575096130371", 1000 => "4.02753186225891", all => "4.15554714202881", }, all => "4.18628716468811", }, damianc => { 4 => { 1 => "0.000234842300415039", 10 => "0.000370979309082031", + 100 => "0.000273942947387695", 1000 => "0.000956058502197266", all => "0.00183582305908203", }, 26 => { 1 => "0.000357866287231445", 10 => "0.000438928604125977" +, 100 => "0.00239300727844238", 1000 => "0.0216219425201416", all => "0.0248117446899414", }, 6400 => { 1 => "1.69514513015747", 10 => "12.9105410575867", 100 +=> "135.586049079895", 1000 => "1520.64571499825", all => "1670.83745026588", }, all => "1670.86409783363", }, tobyink => { 4 => { 1 => "0.000294923782348633", 10 => "0.000179052352905273", + 100 => "5.60283660888672e-005", 1000 => "5.10215759277344e-005", all => "0.000581026077270508", }, 26 => { 1 => "7.60555267333984e-005", 10 => "0.000123977661132813 +", 100 => "0.00015711784362793", 1000 => "0.000526189804077148", all => "0.000883340835571289", }, 6400 => { 1 => "0.071134090423584", 10 => "0.232213973999023", 10 +0 => "4.46547484397888", 1000 => "61.8524870872498", all => "66.6213099956512", }, all => "66.6227743625641", }, }

With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.


Comment on Re^2: Finding repeat sequences. (Results:Part 2. The winner)
Download Code
Re^3: Finding repeat sequences. (Results:Part 2. The winner)
by hdb (Parson) on Jun 19, 2013 at 19:07 UTC

    Congrats to tobyink!

    Thanks to your efforts, BrowserUK, I realized my mistakes in code and thinking. Should you ever get around to re-run your tests, here is a corrected version, don't know whether it is faster then tobyink's.

    hdb => sub { my $input = shift; my $length = length $$input; my $i = 0; my $possible; my $j; while( 1 ) { $possible = substr $$input, 0, ++$i; $possible = substr $$input, 0, $i=$j if( ($j = index $$input, $p +ossible, $i) > 0 ); return $possible if substr( $$input, $i ) eq substr($$input, 0, +$length - $i); } },

      Update: (I sent this to hdb as a /msg; and then decided it belongs in here): That is blinding! I've been looking for a solution that avoided being O(length string). Now I can stop looking. Thankyou!

      Nice one++ Thank you :) (0.228s -v- 70.6s)

      Partisipants in performance tests: hdb tobyink b: 4 in s: 7 hdb :: 0.000154 s b: 4 in s: 7 tobyink :: 0.000169 s b: 4 in s: 43 hdb :: 0.000098 s b: 4 in s: 43 tobyink :: 0.000333 s b: 4 in s: 402 hdb :: 0.000156 s b: 4 in s: 402 tobyink :: 0.000190 s b: 4 in s: 4003 hdb :: 0.000250 s b: 4 in s: 4003 tobyink :: 0.000233 s b: 26 in s: 44 hdb :: 0.000126 s b: 26 in s: 44 tobyink :: 0.000104 s b: 26 in s: 274 hdb :: 0.000040 s b: 26 in s: 274 tobyink :: 0.000106 s b: 26 in s: 2614 hdb :: 0.000031 s b: 26 in s: 2614 tobyink :: 0.000086 s b: 26 in s: 26008 hdb :: 0.000190 s b: 26 in s: 26008 tobyink :: 0.000492 s b: 6400 in s: 8044 hdb :: 0.000483 s b: 6400 in s: 8044 tobyink :: 0.052703 s b: 6400 in s: 69681 hdb :: 0.001241 s b: 6400 in s: 69681 tobyink :: 0.240158 s b: 6400 in s: 641930 hdb :: 0.017612 s b: 6400 in s: 641930 tobyink :: 4.962130 s b: 6400 in s: 6404352 hdb :: 0.208475 s b: 6400 in s: 6404352 tobyink :: 65.351273 s { hdb => { 26 => { 1 => "0.000125885009765625", 10 => "4.00543212890625e-005", 100 => "3.09944152832031e-005", 1000 => "0.000190019607543945", all => "0.000386953353881836", }, 4 => { 1 => "0.000154018402099609", 10 => "9.79900360107422e-005", 100 => "0.000156164169311523", 1000 => "0.000249862670898438", all => "0.000658035278320313", }, 6400 => { 1 => "0.000483036041259766", 10 => "0.00124096870422363", 100 => "0.0176119804382324", 1000 => "0.208475112915039", all => "0.227811098098755", }, all => "0.228856086730957", }, tobyink => { 26 => { 1 => "0.000103950500488281", 10 => "0.000106096267700195", 100 => "8.60691070556641e-005", 1000 => "0.000491857528686523", all => "0.000787973403930664", }, 4 => { 1 => "0.000169038772583008", 10 => "0.000333070755004883", 100 => "0.000190019607543945", 1000 => "0.000233173370361328", all => "0.000925302505493164", }, 6400 => { 1 => "0.0527029037475586", 10 => "0.240158081054688", 100 => "4.96213006973267", 1000 => "65.3512728214264", all => "70.6062638759613", }, all => "70.6079771518707", }, }

      With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
      Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
      "Science is about questioning the status quo. Questioning authority".
      In the absence of evidence, opinion is indistinguishable from prejudice.

        Here are the credits:

        • tobyink for the basic algorithm and the boldness to step away from regexes.
        • sundialsvc4 for the final check whether or not the solution is found,
        • Eily to spot it in and extract it from sundialsvc4's writeup,
        • BrowserUK pointing out that the first version didn't work,
        • and to the Monastery for not expelling people loving to post code instead of lectures.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://1039816]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others about the Monastery: (10)
As of 2014-07-31 10:41 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    My favorite superfluous repetitious redundant duplicative phrase is:









    Results (248 votes), past polls