P is for Practical | |
PerlMonks |
string matching problemby baxy77bax (Deacon) |
on Feb 09, 2012 at 15:45 UTC ( [id://952760]=perlquestion: print w/replies, xml ) | Need Help?? |
baxy77bax has asked for the wisdom of the Perl Monks concerning the following question:
Hello monks I am turning to you after a long quest. I have been searching for the answer to a problem that does not allow me to sleep, eat or think. In past few months I have been struggling with a problem that has now turned to my obsession. Although many have tried to convince me that this is an NP problem ergo, has no practical solution, I have this gut feeling that if i somehow break the problem into small fragments i'll be able to solve it. Now before i discourage you from even attempting to read the post i'll explain the problem I am given a long string over the 3 character alphabet. This string is usually 5-10 bilion characters long. What I am trying to figure out is how long is the shortest unique substring of that string on some position S(x) given that i allow k number of mismatches. So let say that I am looking at the position S(10) now this is the shortest unique substring on S(10), however the shortest unique substring on the same position allowing a single mismatch is : The way you get those results traditionally is by choping your initial string into n-mers and comparing them to every position in your string. and you repeat that by increasing the n-mer length until the only hit left is a self hit. Then you know that you found your answer My approach was based on "divide and conquer" strategy. I chopped my strings into n-mers (n is a small number) to reduce the search space and hashed them. then i compared all-vs-all n-mers to get their relative distances, therefore producing a complex but still manageable graph. and now im stuck. there is no way to reconstruct the alignments from the info i produced in any reasonable amount of time. Does anyone has any pointers or suggestions. I'm not looking for a complexity reduction solution but anything that can make these kind of queries, implementation vise fast. (and memory efficient ) Thank you baxy
Back to
Seekers of Perl Wisdom
|
|