http://www.perlmonks.org?node_id=1040200


in reply to Re^5: Finding repeat sequences.
in thread Finding repeat sequences.

Thought you might like to see my refactoring of your algorithm.

Same performance, but somewhat clearer implementation:

hdbm => sub { my $r = shift; my $i = 1; until( substr( $$r, $i ) eq substr( $$r, 0, -$i ) ) { my $j = index( $$r, substr( $$r, 0, $i ), $i+1 ); $i = $j > 0 ? $j : ++$i; } return substr( $$r, 0, $i ) },

Specifically, it avoids the unfounded "infinite loop" charge without needing to introduce a redundant test.


With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.

Replies are listed 'Best First'.
Re^7: Finding repeat sequences.
by hdb (Monsignor) on Jun 21, 2013 at 17:21 UTC

    Much clearer!

    I like especially the fact that you do not need the length of the input. Another improvement could be that you use the fact that if the call to index fails once it will always fail going forward, like this:

    hdb => sub { my $r = shift; my $i = 1; my $j = 1; until( substr( $$r, $i ) eq substr( $$r, 0, -$i ) ) { if( $j > 0 ) { $j = index( $$r, substr( $$r, 0, $i ), $i ); $i = $j > 0 ? $j : ++$i; } else { $i++; } } return substr( $$r, 0, $i ) },

    avoiding the cost of index and substr paying with one if.

    And Damian's curse hit! Because it has to be $i and not $i+1 in the call to index. I think that's what he meant with off-by-1 error.

      And Damian's curse hit! Because it has to be $i and not $i+1 in the call to index. I think that's what he meant with off-by-1 error.

      Sorry, but you are wrong. You are relying on your eyes rather than tests.

      Here is your original labelled 'hdb', the above version labelled 'hdb2' and my refactor labelled 'buks' going through the test harness:

      [17:49:26.08] C:\test>1039630-b.pl14 -SHOW=0 -T=10 Looking for 'fredfre' in 'fredfrefredfr' Looking for 'fredfre' in 'fredfrefredf' Looking for 'fredfre' in 'fredfrefred' Looking for 'fredfre' in 'fredfrefre' Looking for 'fredfre' in 'fredfrefr' Looking for 'fredfre' in 'fredfref' Looking for 'fredfre' in 'fredfre' Looking for 'fredfre' in 'fredfrefredfrefredfr' Looking for 'fredfre' in 'fredfrefredfrefredf' Looking for 'fredfre' in 'fredfrefredfrefred' Looking for 'fredfre' in 'fredfrefredfrefre' Looking for 'fredfre' in 'fredfrefredfrefr' Looking for 'fredfre' in 'fredfrefredfref' Looking for 'fredfre' in 'fredfrefredfre' Participants in performance tests: hdb buks hdb2 b: 4 in s: 43 hdb :: 0.000070 s b: 4 in s: 43 buks :: 0.000033 s b: 4 in s: 43 hdb2 :: 0.000102 s b: 4 in s: 401 hdb :: 0.000388 s b: 4 in s: 401 buks :: 0.000099 s b: 4 in s: 401 hdb2 :: 0.000104 s b: 4 in s: 4002 hdb :: 0.000080 s b: 4 in s: 4002 buks :: 0.000126 s b: 4 in s: 4002 hdb2 :: 0.000125 s b: 26 in s: 280 hdb :: 0.000026 s b: 26 in s: 280 buks :: 0.000029 s b: 26 in s: 280 hdb2 :: 0.000138 s b: 26 in s: 2621 hdb :: 0.000121 s b: 26 in s: 2621 buks :: 0.000080 s b: 26 in s: 2621 hdb2 :: 0.000093 s b: 26 in s: 26007 hdb :: 0.000209 s b: 26 in s: 26007 buks :: 0.000239 s b: 26 in s: 26007 hdb2 :: 0.000243 s b:640000 in s: 6775097 hdb :: 0.253329 s b:640000 in s: 6775097 buks :: 0.381257 s hdb2 timed out [100]; excluded b:640000 in s: 64243613 hdb :: 2.816375 s b:640000 in s: 64243613 buks :: 2.833577 s b:640000 in s: 640543222 hdb :: 28.591489 s b:640000 in s: 640543222 buks :: 28.325861 s { 26 => { 10 => { buks => "2.88486480712891e-005", hdb => "2.59876251220703e-005", hdb2 => "0.000138044357299805", }, 100 => { buks => "8.0108642578125e-005", hdb => "0.000120878219604492", hdb2 => "9.29832458496094e-005", }, 1000 => { buks => "0.000238895416259766", hdb => "0.00020909309387207", hdb2 => "0.000242948532104492", }, all => { buks => "0.00034785270690918", hdb => "0.000355958938598633", hdb2 => "0.000473976135253906", }, }, 4 => { 10 => { buks => "3.29017639160156e-005", hdb => "7.00950622558594e-005", hdb2 => "0.000102043151855469", }, 100 => { buks => "9.918212890625e-005", hdb => "0.000388145446777344", hdb2 => "0.000104188919067383", }, 1000 => { buks => "0.000126123428344727", hdb => "7.98702239990234e-005", hdb2 => "0.000124931335449219", }, all => { buks => "0.000258207321166992", hdb => "0.000538110733032227", hdb2 => "0.00033116340637207", }, }, 640000 => { 10 => { buks => "0.381257057189941", hdb => "0.253329038619995" +}, 100 => { buks => "2.83357691764832", hdb => "2.81637501716614" }, 1000 => { buks => "28.3258609771729", hdb => "28.5914890766144" }, all => { buks => "31.5406949520111", hdb => "31.6611931324005" }, }, all => { buks => "31.5413010120392", hdb => "31.6620872020721", hdb2 => "0.000805139541625977", }, }

      With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
      Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
      "Science is about questioning the status quo. Questioning authority".
      In the absence of evidence, opinion is indistinguishable from prejudice.

        Sir, you are correct. I was thinking of a string of 'a's but that one case is already covered by the initial until condition.

        Interesting that my last modification timed out. Did you run it with $i or $i+1?