Beefy Boxes and Bandwidth Generously Provided by pair Networks
Don't ask to ask, just ask
 
PerlMonks  

Re^5: Finding repeat sequences.

by hdb (Parson)
on Jun 20, 2013 at 09:00 UTC ( #1039922=note: print w/ replies, xml ) Need Help??


in reply to Re^4: Finding repeat sequences.
in thread Finding repeat sequences.

I fully agree with your arguments and admit that I got carried away in my strive to make the algorithm as short as possible. In particular your comment on a small change could break it, is very valid.

For something more maintainable, it should be written like this which does exactly the same thing (with the exception of the test in the while loop which is redundant).

hdb => sub { my $input = shift; my $length = length $$input; my $i = 0; while( $i < $length ) { # redundant, but to be on + the safe side $i++; # length of next candidat +e my $possible = substr $$input, 0, $i; # next candidate my $j = index $$input, $possible, $i; # try to find it to the r +ight if( $j > 0 ) { # if found, skip ahead $possible = substr $$input, 0, $j; # the candidate has to be + this long at least $i = $j; # store its length } # check if we have a solution already if( substr( $$input, $i ) eq substr($$input, 0, $length - $i) ) +{ return $possible; # success ! } } # will never reach here return ''; },

And an explanation why the skip ahead is allowed should probably also be added...


Comment on Re^5: Finding repeat sequences.
Download Code
Re^6: Finding repeat sequences.
by BrowserUk (Pope) on Jun 21, 2013 at 16:26 UTC

    Thought you might like to see my refactoring of your algorithm.

    Same performance, but somewhat clearer implementation:

    hdbm => sub { my $r = shift; my $i = 1; until( substr( $$r, $i ) eq substr( $$r, 0, -$i ) ) { my $j = index( $$r, substr( $$r, 0, $i ), $i+1 ); $i = $j > 0 ? $j : ++$i; } return substr( $$r, 0, $i ) },

    Specifically, it avoids the unfounded "infinite loop" charge without needing to introduce a redundant test.


    With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.

      Much clearer!

      I like especially the fact that you do not need the length of the input. Another improvement could be that you use the fact that if the call to index fails once it will always fail going forward, like this:

      hdb => sub { my $r = shift; my $i = 1; my $j = 1; until( substr( $$r, $i ) eq substr( $$r, 0, -$i ) ) { if( $j > 0 ) { $j = index( $$r, substr( $$r, 0, $i ), $i ); $i = $j > 0 ? $j : ++$i; } else { $i++; } } return substr( $$r, 0, $i ) },

      avoiding the cost of index and substr paying with one if.

      And Damian's curse hit! Because it has to be $i and not $i+1 in the call to index. I think that's what he meant with off-by-1 error.

        And Damian's curse hit! Because it has to be $i and not $i+1 in the call to index. I think that's what he meant with off-by-1 error.

        Sorry, but you are wrong. You are relying on your eyes rather than tests.

        Here is your original labelled 'hdb', the above version labelled 'hdb2' and my refactor labelled 'buks' going through the test harness:

        [17:49:26.08] C:\test>1039630-b.pl14 -SHOW=0 -T=10 Looking for 'fredfre' in 'fredfrefredfr' Looking for 'fredfre' in 'fredfrefredf' Looking for 'fredfre' in 'fredfrefred' Looking for 'fredfre' in 'fredfrefre' Looking for 'fredfre' in 'fredfrefr' Looking for 'fredfre' in 'fredfref' Looking for 'fredfre' in 'fredfre' Looking for 'fredfre' in 'fredfrefredfrefredfr' Looking for 'fredfre' in 'fredfrefredfrefredf' Looking for 'fredfre' in 'fredfrefredfrefred' Looking for 'fredfre' in 'fredfrefredfrefre' Looking for 'fredfre' in 'fredfrefredfrefr' Looking for 'fredfre' in 'fredfrefredfref' Looking for 'fredfre' in 'fredfrefredfre' Participants in performance tests: hdb buks hdb2 b: 4 in s: 43 hdb :: 0.000070 s b: 4 in s: 43 buks :: 0.000033 s b: 4 in s: 43 hdb2 :: 0.000102 s b: 4 in s: 401 hdb :: 0.000388 s b: 4 in s: 401 buks :: 0.000099 s b: 4 in s: 401 hdb2 :: 0.000104 s b: 4 in s: 4002 hdb :: 0.000080 s b: 4 in s: 4002 buks :: 0.000126 s b: 4 in s: 4002 hdb2 :: 0.000125 s b: 26 in s: 280 hdb :: 0.000026 s b: 26 in s: 280 buks :: 0.000029 s b: 26 in s: 280 hdb2 :: 0.000138 s b: 26 in s: 2621 hdb :: 0.000121 s b: 26 in s: 2621 buks :: 0.000080 s b: 26 in s: 2621 hdb2 :: 0.000093 s b: 26 in s: 26007 hdb :: 0.000209 s b: 26 in s: 26007 buks :: 0.000239 s b: 26 in s: 26007 hdb2 :: 0.000243 s b:640000 in s: 6775097 hdb :: 0.253329 s b:640000 in s: 6775097 buks :: 0.381257 s hdb2 timed out [100]; excluded b:640000 in s: 64243613 hdb :: 2.816375 s b:640000 in s: 64243613 buks :: 2.833577 s b:640000 in s: 640543222 hdb :: 28.591489 s b:640000 in s: 640543222 buks :: 28.325861 s { 26 => { 10 => { buks => "2.88486480712891e-005", hdb => "2.59876251220703e-005", hdb2 => "0.000138044357299805", }, 100 => { buks => "8.0108642578125e-005", hdb => "0.000120878219604492", hdb2 => "9.29832458496094e-005", }, 1000 => { buks => "0.000238895416259766", hdb => "0.00020909309387207", hdb2 => "0.000242948532104492", }, all => { buks => "0.00034785270690918", hdb => "0.000355958938598633", hdb2 => "0.000473976135253906", }, }, 4 => { 10 => { buks => "3.29017639160156e-005", hdb => "7.00950622558594e-005", hdb2 => "0.000102043151855469", }, 100 => { buks => "9.918212890625e-005", hdb => "0.000388145446777344", hdb2 => "0.000104188919067383", }, 1000 => { buks => "0.000126123428344727", hdb => "7.98702239990234e-005", hdb2 => "0.000124931335449219", }, all => { buks => "0.000258207321166992", hdb => "0.000538110733032227", hdb2 => "0.00033116340637207", }, }, 640000 => { 10 => { buks => "0.381257057189941", hdb => "0.253329038619995" +}, 100 => { buks => "2.83357691764832", hdb => "2.81637501716614" }, 1000 => { buks => "28.3258609771729", hdb => "28.5914890766144" }, all => { buks => "31.5406949520111", hdb => "31.6611931324005" }, }, all => { buks => "31.5413010120392", hdb => "31.6620872020721", hdb2 => "0.000805139541625977", }, }

        With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
        Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
        "Science is about questioning the status quo. Questioning authority".
        In the absence of evidence, opinion is indistinguishable from prejudice.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://1039922]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others pondering the Monastery: (11)
As of 2014-07-31 09:29 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    My favorite superfluous repetitious redundant duplicative phrase is:









    Results (248 votes), past polls