in reply to Re: Finding repeat sequences.
in thread Finding repeat sequences.
What is not clear to me from your description is whether you are looking for the longest substring with at least one repeat, or whether you are looking for the arbitrary length substring with the highest repeat count, or whether you are looking for the substring which, along with its (adjacent?) repeats comprises the longest length, or something else. Can you provide some more information and examples?
I thought (believe) I have described the problem exactly. Constructing examples is hard -- I have a program running (for 4+ hours now) generating controlled random string and trying to find exceptional cases.
I'll try the description (unsatisfactory) again.
The complete string will consist of, and only of, one or more repetitions of a substring, But the last repetition may be truncated. In code:
my $substring = getsubstring(); my $string = $substring x int( rand $N ); substr( $string, -int( rand length( $substring) ) ) = '' if length $ss +tring > length $substring;
That is, all these are valid strings and all have 'fred' as their substring:
fredf fredfr fredfre fredfred fredfredf fredfredfr fredfredfre
With regard to suffix trees, I feel I would probably need a prefix tree (Trie) instead, but these string can be very long and every implementation of Trie I've seen would not handle them.
|
---|
Replies are listed 'Best First'. | |
---|---|
Re^3: Finding repeat sequences.
by rjt (Curate) on Jun 18, 2013 at 22:31 UTC |