Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl Monk, Perl Meditation
 
PerlMonks  

Re: Fast Identification Of String Difference

by Limbic~Region (Chancellor)
on Jan 17, 2011 at 14:41 UTC ( #882670=note: print w/ replies, xml ) Need Help??


in reply to Fast Identification Of String Difference

neversaint,
Since you appear to have nucleic acid strings, perhaps you could leverage the minimal alphabet used to pre-compute differences. Assuming each position can only be represented by 5 characters (A, C, T, G and -) then taking 4 characters at a time, there can only be 625 possible strings. Comparing two such sub strings means there are only 390,625 possible pairs - surely small enough to fit into memory.

I do realize that you have to keep track of which "quad" you are comparing so that your position comes out right but this is hardly significant. Even comparing 5 characters at a time is less than 10 million parings so depending on your memory constraints you may not be limited to quads. Of course, if your strings can have more than 5 characters at each position you will have to do fewer at a time.

If you need an implementation of this, let me know but I think it should be fairly obvious.

Cheers - L~R


Comment on Re: Fast Identification Of String Difference

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://882670]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others pondering the Monastery: (11)
As of 2014-12-18 07:47 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    Is guessing a good strategy for surviving in the IT business?





    Results (44 votes), past polls