Beefy Boxes and Bandwidth Generously Provided by pair Networks
Think about Loose Coupling
 
PerlMonks  

Re: Fast Identification Of String Difference

by Limbic~Region (Chancellor)
on Jan 17, 2011 at 14:41 UTC ( #882670=note: print w/ replies, xml ) Need Help??


in reply to Fast Identification Of String Difference

neversaint,
Since you appear to have nucleic acid strings, perhaps you could leverage the minimal alphabet used to pre-compute differences. Assuming each position can only be represented by 5 characters (A, C, T, G and -) then taking 4 characters at a time, there can only be 625 possible strings. Comparing two such sub strings means there are only 390,625 possible pairs - surely small enough to fit into memory.

I do realize that you have to keep track of which "quad" you are comparing so that your position comes out right but this is hardly significant. Even comparing 5 characters at a time is less than 10 million parings so depending on your memory constraints you may not be limited to quads. Of course, if your strings can have more than 5 characters at each position you will have to do fewer at a time.

If you need an implementation of this, let me know but I think it should be fairly obvious.

Cheers - L~R


Comment on Re: Fast Identification Of String Difference

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://882670]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others wandering the Monastery: (15)
As of 2015-07-07 20:22 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    The top three priorities of my open tasks are (in descending order of likelihood to be worked on) ...









    Results (93 votes), past polls