Beefy Boxes and Bandwidth Generously Provided by pair Networks
Clear questions and runnable code
get the best and fastest answer
 
PerlMonks  

Re: Finding repeat sequences. (Results:Part 1)

by BrowserUk (Pope)
on Jun 19, 2013 at 16:17 UTC ( #1039799=note: print w/ replies, xml ) Need Help??


in reply to Finding repeat sequences.

Below are the results from my first pass -- verifying basic functionality -- of (my adaptions of) the 8 solutions from tye, choroba, DamianConway, tobyink, AnomalousMonk, hdb, Eily, sundialsvc4 in this thread:

C:\test>1039630-b.pl14 -SHOW=0 Looking for 'fredfre' in 'fredfrefredfr' tye found 'fredfrefred'; excluded from further consideration svc4 found 'redfrefredf'; excluded from further consideration Looking for 'fredfre' in 'fredfrefredf' Looking for 'fredfre' in 'fredfrefred' hdb found 'none found'; excluded from further consideration Looking for 'fredfre' in 'fredfrefre' Looking for 'fredfre' in 'fredfrefr' Looking for 'fredfre' in 'fredfref' Looking for 'fredfre' in 'fredfre' Looking for 'fredfre' in 'fredfrefredfrefredfr' choroba found 'fredfrefredfre'; excluded from further consideration Looking for 'fredfre' in 'fredfrefredfrefredf' Looking for 'fredfre' in 'fredfrefredfrefred' Looking for 'fredfre' in 'fredfrefredfrefre' Looking for 'fredfre' in 'fredfrefredfrefr' Looking for 'fredfre' in 'fredfrefredfref' Looking for 'fredfre' in 'fredfrefredfre' Eily found 'fredfrefred'; excluded from further consideration Partisipants in performance tests: anomalous tobyink damianc

If authors want to correct (my adaptions of) their solutions that's great, (but please don't moan at me If I screwed the pouch adapting them to subroutines :).

Here's the test harness:

#! perl -slw use strict; use 5.014; use Time::HiRes qw[ time ]; my %tests = ( tye => sub { state $re = qr[^((.*?).*?)(?=.)\1*\2$]; my $r = shift; $$r =~ $re and return $1; return; }, choroba => sub { state $re = qr[^((.*).*)\1*\2$]; my $r = shift; $$r =~ $re and return $1; return; }, damianc => sub { state $re = qr[ ^(.+?)(?{$^N}) ## $1 as $^R (?| \1+$() ## Exact rep, no $2 | \1*(.+)$ ## Partial rep as $2 ## Check its a proper prefix (??{ $^N eq substr( $^R, 0, length($^N)) ? '' : '(?! +)' }) ) ]x; my $r = shift; $$r =~ $re and return $1; return; }, tobyink => sub { my $input = shift; my $length = length $$input; for my $i ( 1 .. $length ) { my $possible = substr( $$input, 0, $i ); my $repeated = $possible x ( 1 + int( $length / $i ) ); return $possible if $$input eq substr( $repeated, 0, $leng +th ); } return; }, hdb => sub { my $input = shift; my $length = length $$input; my $i = 0; my $possible; while( 1 ) { $possible = substr $$input, 0, $i+1; # increase length by +1 $i = index $$input, $possible, $i+1; # find next occurence + of candidate return if $i < 0; # if not found return full st +ring => no repetition $possible = substr $$input, 0, $i; # this is the minimum + length candidate return $possible if $$input eq substr( $possible x ( 1 + i +nt( $length / $i ) ), 0, $length ); # success } }, Eily => sub { my $input = shift; local $_ = reverse $$input; /^(.*)(.+?\1)\2*$/ and return reverse $2; return; }, anomalous => sub { state $re = qr[ \A (.+?) \1* (.*) (?(?{ 0 != index $1, $2 }) ( +*FAIL)) \z ]xms; my $r = shift; $$r =~ $re and return $1; return }, svc4 => sub { my $search = shift; my $tail_length = 1; my $tail_step = int( length( $$search ) / 2 ); while ($tail_step > 0) { $tail_length += $tail_step while substr( $$search, 0, $tail_length + $tail_step ) eq substr( $$search, -( $tail_length + $tail_step ) +, ( $tail_length + $tail_step ) ) ; $tail_step = int( $tail_step / 2 ); } my $body_length = length( $$search ) - $tail_length; my $longest = $body_length; my $n = $body_length - 1; while( $n > 1 ) { if( ( $body_length % $n ) == 0 ) { if( substr( $$search, 0, $n ) eq substr( $$search, $n, + $n ) ) { $longest = $n; last; } } $n--; } return substr( $$search, 1, $longest ); }, ); our $SHOW //= 1; ## basic functionality my $base = 'fredfre'; for my $rep ( 2 .. 3 ) { my $full = $base x $rep; for my $x ( 1 .. length( $base ) ) { my $str = substr( $full, 0, -$x ); ## make partial print "Looking for '$base' in '$str'"; for my $test ( keys %tests ) { my $res; eval { local $SIG{ALRM} = sub { die "timeout" }; alarm 10; $res = $tests{ $test }->( \$str ) // 'none found'; alarm 0; }; $SHOW and printf "%10s -> %s\n", $test, $res // $@; if( $res ne $base and not $base eq $str or $@ eq 'timeout' ) { delete $tests{ $test }; printf "\n%10s found '%s'; excluded from further consi +deration\n\n", $test, $res // $@ ; } } } } print "Partisipants in performance tests: @{[ keys %tests ]}"; exit;

With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.


Comment on Re: Finding repeat sequences. (Results:Part 1)
Select or Download Code
Re^2: Finding repeat sequences. (Results:Part 2. The winner)
by BrowserUk (Pope) on Jun 19, 2013 at 18:06 UTC

    After exclusions on functionality, the three left standing -- DamianConway, tobyink & AnomalousMonk -- went forward to performance testing where the latter quickly fell by the wayside.

    Of the remaining two, tobyink's solution is hands down winner with a cumulative 66 seconds version DamianConway's 1670 seconds:

    ... my @bases = ( 'fred', join( '', 'a'..'z' ), unpack( 'b*', pack 'Q*', 0 .. 99 ), ); my %res; for my $base ( @bases ) { for my $reps ( 1, 10, 100, 1000 ) { my $str = $base x $reps . substr( $base, 0, rand( length $base + ) ); for my $test ( keys %tests ) { my $start = time; my $res; eval { local $SIG{ALRM} = sub { die "timeout\n" }; alarm $T * $reps; $res = $tests{ $test }->( \$str ) // 'none found'; alarm 0; }; if( $@ eq "timeout\n" ) { delete $tests{ $test }; warn "$test timed out [@{[ $T * $reps ]}]; excluded\n" +; next; } my $stop = time; warn "$test: Found '$res' instead of '$base'\n" unless $re +s eq $base; $res{ $test }{ length( $base ) }{ $reps } = $stop - $start +; $res{ $test }{ length( $base ) }{ all } += $stop - $start; $res{ $test }{ all } += $stop - $start; printf "b:%5u in s:%10u %10s :: %f s\n", length( $base ), length( $str ), $test, $stop - $start ; } } } pp \%res; __END__ [18:23:33.12] C:\test>1039630-b.pl14 -SHOW=0 -T=10 tye found 'fredfrefred'; excluded from further consideration svc4 found 'redfrefredf'; excluded from further consideration hdb found 'none found'; excluded from further consideration choroba found 'fredfrefredfre'; excluded from further consideration Eily found 'fredfrefred'; excluded from further consideration Partisipants in performance tests: anomalous tobyink damianc anomalous timed out [10]; excluded { anomalous => { 4 => { 1 => "0.000113964080810547", 10 => "0.000288963317871094 +", 100 => "0.00277900695800781", 1000 => "0.0275580883026123", all => "0.0307400226593018", }, 26 => { 1 => "0.0011751651763916", 10 => "0.00826501846313477", +100 => "0.118575096130371", 1000 => "4.02753186225891", all => "4.15554714202881", }, all => "4.18628716468811", }, damianc => { 4 => { 1 => "0.000234842300415039", 10 => "0.000370979309082031", + 100 => "0.000273942947387695", 1000 => "0.000956058502197266", all => "0.00183582305908203", }, 26 => { 1 => "0.000357866287231445", 10 => "0.000438928604125977" +, 100 => "0.00239300727844238", 1000 => "0.0216219425201416", all => "0.0248117446899414", }, 6400 => { 1 => "1.69514513015747", 10 => "12.9105410575867", 100 +=> "135.586049079895", 1000 => "1520.64571499825", all => "1670.83745026588", }, all => "1670.86409783363", }, tobyink => { 4 => { 1 => "0.000294923782348633", 10 => "0.000179052352905273", + 100 => "5.60283660888672e-005", 1000 => "5.10215759277344e-005", all => "0.000581026077270508", }, 26 => { 1 => "7.60555267333984e-005", 10 => "0.000123977661132813 +", 100 => "0.00015711784362793", 1000 => "0.000526189804077148", all => "0.000883340835571289", }, 6400 => { 1 => "0.071134090423584", 10 => "0.232213973999023", 10 +0 => "4.46547484397888", 1000 => "61.8524870872498", all => "66.6213099956512", }, all => "66.6227743625641", }, }

    With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.

      Congrats to tobyink!

      Thanks to your efforts, BrowserUK, I realized my mistakes in code and thinking. Should you ever get around to re-run your tests, here is a corrected version, don't know whether it is faster then tobyink's.

      hdb => sub { my $input = shift; my $length = length $$input; my $i = 0; my $possible; my $j; while( 1 ) { $possible = substr $$input, 0, ++$i; $possible = substr $$input, 0, $i=$j if( ($j = index $$input, $p +ossible, $i) > 0 ); return $possible if substr( $$input, $i ) eq substr($$input, 0, +$length - $i); } },

        Update: (I sent this to hdb as a /msg; and then decided it belongs in here): That is blinding! I've been looking for a solution that avoided being O(length string). Now I can stop looking. Thankyou!

        Nice one++ Thank you :) (0.228s -v- 70.6s)

        Partisipants in performance tests: hdb tobyink b: 4 in s: 7 hdb :: 0.000154 s b: 4 in s: 7 tobyink :: 0.000169 s b: 4 in s: 43 hdb :: 0.000098 s b: 4 in s: 43 tobyink :: 0.000333 s b: 4 in s: 402 hdb :: 0.000156 s b: 4 in s: 402 tobyink :: 0.000190 s b: 4 in s: 4003 hdb :: 0.000250 s b: 4 in s: 4003 tobyink :: 0.000233 s b: 26 in s: 44 hdb :: 0.000126 s b: 26 in s: 44 tobyink :: 0.000104 s b: 26 in s: 274 hdb :: 0.000040 s b: 26 in s: 274 tobyink :: 0.000106 s b: 26 in s: 2614 hdb :: 0.000031 s b: 26 in s: 2614 tobyink :: 0.000086 s b: 26 in s: 26008 hdb :: 0.000190 s b: 26 in s: 26008 tobyink :: 0.000492 s b: 6400 in s: 8044 hdb :: 0.000483 s b: 6400 in s: 8044 tobyink :: 0.052703 s b: 6400 in s: 69681 hdb :: 0.001241 s b: 6400 in s: 69681 tobyink :: 0.240158 s b: 6400 in s: 641930 hdb :: 0.017612 s b: 6400 in s: 641930 tobyink :: 4.962130 s b: 6400 in s: 6404352 hdb :: 0.208475 s b: 6400 in s: 6404352 tobyink :: 65.351273 s { hdb => { 26 => { 1 => "0.000125885009765625", 10 => "4.00543212890625e-005", 100 => "3.09944152832031e-005", 1000 => "0.000190019607543945", all => "0.000386953353881836", }, 4 => { 1 => "0.000154018402099609", 10 => "9.79900360107422e-005", 100 => "0.000156164169311523", 1000 => "0.000249862670898438", all => "0.000658035278320313", }, 6400 => { 1 => "0.000483036041259766", 10 => "0.00124096870422363", 100 => "0.0176119804382324", 1000 => "0.208475112915039", all => "0.227811098098755", }, all => "0.228856086730957", }, tobyink => { 26 => { 1 => "0.000103950500488281", 10 => "0.000106096267700195", 100 => "8.60691070556641e-005", 1000 => "0.000491857528686523", all => "0.000787973403930664", }, 4 => { 1 => "0.000169038772583008", 10 => "0.000333070755004883", 100 => "0.000190019607543945", 1000 => "0.000233173370361328", all => "0.000925302505493164", }, 6400 => { 1 => "0.0527029037475586", 10 => "0.240158081054688", 100 => "4.96213006973267", 1000 => "65.3512728214264", all => "70.6062638759613", }, all => "70.6079771518707", }, }

        With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
        Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
        "Science is about questioning the status quo. Questioning authority".
        In the absence of evidence, opinion is indistinguishable from prejudice.
Re^2: Finding repeat sequences. (definitions; part $n)
by tye (Cardinal) on Jun 20, 2013 at 00:40 UTC
    Looking for 'fredfre' in 'fredfrefredfr' tye found 'fredfrefred'; excluded from further consideration

    Looks like you changed your definition of your desired results, so it was silly to use a "solution" tailored to a different definition.

    This particular example matches my original speculation of what made sense, which means you might want to try my original solution. Though, I didn't wade through everything trying to find the various redefinitions of what was desired (besides the ones given in reply to my nodes) and then sort them chronologically in order to figure out what the new requirements actually are.

    - tye        

      There is no new requirement. Re-read the OP.


      With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
      Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
      "Science is about questioning the status quo. Questioning authority".
      In the absence of evidence, opinion is indistinguishable from prejudice.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://1039799]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others making s'mores by the fire in the courtyard of the Monastery: (16)
As of 2014-07-30 20:29 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    My favorite superfluous repetitious redundant duplicative phrase is:









    Results (240 votes), past polls