Beefy Boxes and Bandwidth Generously Provided by pair Networks
No such thing as a small change
 
PerlMonks  

Re: Finding repeat sequences.

by rjt (Deacon)
on Jun 18, 2013 at 20:23 UTC ( #1039654=note: print w/replies, xml ) Need Help??


in reply to Finding repeat sequences.

This smells an awful lot to me like the Longest Repeated Substring Problem, maybe with a bit of a twist. Have you looked at SuffixTree?

use SuffixTree; my $stree = create_tree('abcdabcdabceabcdabcdabceab'); print_tree($stree);

What is not clear to me from your description is whether you are looking for the longest substring with at least one repeat, or whether you are looking for the arbitrary length substring with the highest repeat count, or whether you are looking for the substring which, along with its (adjacent?) repeats comprises the longest length, or something else. Can you provide some more information and examples?

A Super Search revealed:

Replies are listed 'Best First'.
Re^2: Finding repeat sequences.
by BrowserUk (Pope) on Jun 18, 2013 at 20:39 UTC
    What is not clear to me from your description is whether you are looking for the longest substring with at least one repeat, or whether you are looking for the arbitrary length substring with the highest repeat count, or whether you are looking for the substring which, along with its (adjacent?) repeats comprises the longest length, or something else. Can you provide some more information and examples?

    I thought (believe) I have described the problem exactly. Constructing examples is hard -- I have a program running (for 4+ hours now) generating controlled random string and trying to find exceptional cases.

    I'll try the description (unsatisfactory) again.

    The complete string will consist of, and only of, one or more repetitions of a substring, But the last repetition may be truncated. In code:

    my $substring = getsubstring(); my $string = $substring x int( rand $N ); substr( $string, -int( rand length( $substring) ) ) = '' if length $ss +tring > length $substring;

    That is, all these are valid strings and all have 'fred' as their substring:

    fredf fredfr fredfre fredfred fredfredf fredfredfr fredfredfre

    With regard to suffix trees, I feel I would probably need a prefix tree (Trie) instead, but these string can be very long and every implementation of Trie I've seen would not handle them.


    With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.
      I thought (believe) I have described the problem exactly.

      Oh, I do not question that. What is in question is my own ability to comprehend. :-) Thanks for clarifying. I have a better idea now.

      This problem oozes recursion and backtracking (which is what most of the regex solutions are trying to accomplish). As much as I like punctuation by the kilo, I might try a more explicit recursive sub solution as a first cut, if for no other reason than to sprinkle some debug and trace the algorithm and data rep. Come up with some firm test cases based on that, then optimize.

      Edit: Aha, it looks like Mr. Conway hit it on the head!

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://1039654]
help
Chatterbox?
[james28909]: i cannot understand how someone cant see that a cloud of gas that evolves into a solar system or galaxies. it just makes sense to me. everythign that has happened, had to have happened for anything on this planet to be.
[james28909]: /gas/gas and debris/
[james28909]: another thing, i used to be atheist. life experiences, personal evidence. and my own perception is the reason why i believe the things i do.
[erix]: and now you are agnostic?
[james28909]: arunbear, your picking bits of history to fit your own narrative. right now, humans are smarter than in any of those times.
[james28909]: in order for there to be a mutation, there has to be a universe with laws setup that even allow a mutation.
[james28909]: everything past the first two things you piked from my posts are irrelevant to me and my beliefs. they are side effects.

How do I use this? | Other CB clients
Other Users?
Others pondering the Monastery: (16)
As of 2017-12-15 14:50 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?
    What programming language do you hate the most?




















    Results (433 votes). Check out past polls.

    Notices?