Beefy Boxes and Bandwidth Generously Provided by pair Networks
Think about Loose Coupling
 
PerlMonks  

Re^2: Finding repeat sequences. (Results:Part 2. The winner)

by BrowserUk (Pope)
on Jun 19, 2013 at 18:06 UTC ( #1039816=note: print w/ replies, xml ) Need Help??


in reply to Re: Finding repeat sequences. (Results:Part 1)
in thread Finding repeat sequences.

After exclusions on functionality, the three left standing -- DamianConway, tobyink & AnomalousMonk -- went forward to performance testing where the latter quickly fell by the wayside.

Of the remaining two, tobyink's solution is hands down winner with a cumulative 66 seconds version DamianConway's 1670 seconds:

... my @bases = ( 'fred', join( '', 'a'..'z' ), unpack( 'b*', pack 'Q*', 0 .. 99 ), ); my %res; for my $base ( @bases ) { for my $reps ( 1, 10, 100, 1000 ) { my $str = $base x $reps . substr( $base, 0, rand( length $base + ) ); for my $test ( keys %tests ) { my $start = time; my $res; eval { local $SIG{ALRM} = sub { die "timeout\n" }; alarm $T * $reps; $res = $tests{ $test }->( \$str ) // 'none found'; alarm 0; }; if( $@ eq "timeout\n" ) { delete $tests{ $test }; warn "$test timed out [@{[ $T * $reps ]}]; excluded\n" +; next; } my $stop = time; warn "$test: Found '$res' instead of '$base'\n" unless $re +s eq $base; $res{ $test }{ length( $base ) }{ $reps } = $stop - $start +; $res{ $test }{ length( $base ) }{ all } += $stop - $start; $res{ $test }{ all } += $stop - $start; printf "b:%5u in s:%10u %10s :: %f s\n", length( $base ), length( $str ), $test, $stop - $start ; } } } pp \%res; __END__ [18:23:33.12] C:\test>1039630-b.pl14 -SHOW=0 -T=10 tye found 'fredfrefred'; excluded from further consideration svc4 found 'redfrefredf'; excluded from further consideration hdb found 'none found'; excluded from further consideration choroba found 'fredfrefredfre'; excluded from further consideration Eily found 'fredfrefred'; excluded from further consideration Partisipants in performance tests: anomalous tobyink damianc anomalous timed out [10]; excluded { anomalous => { 4 => { 1 => "0.000113964080810547", 10 => "0.000288963317871094 +", 100 => "0.00277900695800781", 1000 => "0.0275580883026123", all => "0.0307400226593018", }, 26 => { 1 => "0.0011751651763916", 10 => "0.00826501846313477", +100 => "0.118575096130371", 1000 => "4.02753186225891", all => "4.15554714202881", }, all => "4.18628716468811", }, damianc => { 4 => { 1 => "0.000234842300415039", 10 => "0.000370979309082031", + 100 => "0.000273942947387695", 1000 => "0.000956058502197266", all => "0.00183582305908203", }, 26 => { 1 => "0.000357866287231445", 10 => "0.000438928604125977" +, 100 => "0.00239300727844238", 1000 => "0.0216219425201416", all => "0.0248117446899414", }, 6400 => { 1 => "1.69514513015747", 10 => "12.9105410575867", 100 +=> "135.586049079895", 1000 => "1520.64571499825", all => "1670.83745026588", }, all => "1670.86409783363", }, tobyink => { 4 => { 1 => "0.000294923782348633", 10 => "0.000179052352905273", + 100 => "5.60283660888672e-005", 1000 => "5.10215759277344e-005", all => "0.000581026077270508", }, 26 => { 1 => "7.60555267333984e-005", 10 => "0.000123977661132813 +", 100 => "0.00015711784362793", 1000 => "0.000526189804077148", all => "0.000883340835571289", }, 6400 => { 1 => "0.071134090423584", 10 => "0.232213973999023", 10 +0 => "4.46547484397888", 1000 => "61.8524870872498", all => "66.6213099956512", }, all => "66.6227743625641", }, }

With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.


Comment on Re^2: Finding repeat sequences. (Results:Part 2. The winner)
Download Code
Re^3: Finding repeat sequences. (Results:Part 2. The winner)
by hdb (Prior) on Jun 19, 2013 at 19:07 UTC

    Congrats to tobyink!

    Thanks to your efforts, BrowserUK, I realized my mistakes in code and thinking. Should you ever get around to re-run your tests, here is a corrected version, don't know whether it is faster then tobyink's.

    hdb => sub { my $input = shift; my $length = length $$input; my $i = 0; my $possible; my $j; while( 1 ) { $possible = substr $$input, 0, ++$i; $possible = substr $$input, 0, $i=$j if( ($j = index $$input, $p +ossible, $i) > 0 ); return $possible if substr( $$input, $i ) eq substr($$input, 0, +$length - $i); } },

      Update: (I sent this to hdb as a /msg; and then decided it belongs in here): That is blinding! I've been looking for a solution that avoided being O(length string). Now I can stop looking. Thankyou!

      Nice one++ Thank you :) (0.228s -v- 70.6s)

      Partisipants in performance tests: hdb tobyink b: 4 in s: 7 hdb :: 0.000154 s b: 4 in s: 7 tobyink :: 0.000169 s b: 4 in s: 43 hdb :: 0.000098 s b: 4 in s: 43 tobyink :: 0.000333 s b: 4 in s: 402 hdb :: 0.000156 s b: 4 in s: 402 tobyink :: 0.000190 s b: 4 in s: 4003 hdb :: 0.000250 s b: 4 in s: 4003 tobyink :: 0.000233 s b: 26 in s: 44 hdb :: 0.000126 s b: 26 in s: 44 tobyink :: 0.000104 s b: 26 in s: 274 hdb :: 0.000040 s b: 26 in s: 274 tobyink :: 0.000106 s b: 26 in s: 2614 hdb :: 0.000031 s b: 26 in s: 2614 tobyink :: 0.000086 s b: 26 in s: 26008 hdb :: 0.000190 s b: 26 in s: 26008 tobyink :: 0.000492 s b: 6400 in s: 8044 hdb :: 0.000483 s b: 6400 in s: 8044 tobyink :: 0.052703 s b: 6400 in s: 69681 hdb :: 0.001241 s b: 6400 in s: 69681 tobyink :: 0.240158 s b: 6400 in s: 641930 hdb :: 0.017612 s b: 6400 in s: 641930 tobyink :: 4.962130 s b: 6400 in s: 6404352 hdb :: 0.208475 s b: 6400 in s: 6404352 tobyink :: 65.351273 s { hdb => { 26 => { 1 => "0.000125885009765625", 10 => "4.00543212890625e-005", 100 => "3.09944152832031e-005", 1000 => "0.000190019607543945", all => "0.000386953353881836", }, 4 => { 1 => "0.000154018402099609", 10 => "9.79900360107422e-005", 100 => "0.000156164169311523", 1000 => "0.000249862670898438", all => "0.000658035278320313", }, 6400 => { 1 => "0.000483036041259766", 10 => "0.00124096870422363", 100 => "0.0176119804382324", 1000 => "0.208475112915039", all => "0.227811098098755", }, all => "0.228856086730957", }, tobyink => { 26 => { 1 => "0.000103950500488281", 10 => "0.000106096267700195", 100 => "8.60691070556641e-005", 1000 => "0.000491857528686523", all => "0.000787973403930664", }, 4 => { 1 => "0.000169038772583008", 10 => "0.000333070755004883", 100 => "0.000190019607543945", 1000 => "0.000233173370361328", all => "0.000925302505493164", }, 6400 => { 1 => "0.0527029037475586", 10 => "0.240158081054688", 100 => "4.96213006973267", 1000 => "65.3512728214264", all => "70.6062638759613", }, all => "70.6079771518707", }, }

      With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
      Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
      "Science is about questioning the status quo. Questioning authority".
      In the absence of evidence, opinion is indistinguishable from prejudice.

        Here are the credits:

        • tobyink for the basic algorithm and the boldness to step away from regexes.
        • sundialsvc4 for the final check whether or not the solution is found,
        • Eily to spot it in and extract it from sundialsvc4's writeup,
        • BrowserUK pointing out that the first version didn't work,
        • and to the Monastery for not expelling people loving to post code instead of lectures.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://1039816]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others scrutinizing the Monastery: (7)
As of 2015-07-07 02:21 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    The top three priorities of my open tasks are (in descending order of likelihood to be worked on) ...









    Results (86 votes), past polls