Beefy Boxes and Bandwidth Generously Provided by pair Networks
Don't ask to ask, just ask

Re: Fast Identification Of String Difference

by Limbic~Region (Chancellor)
on Jan 17, 2011 at 14:41 UTC ( #882670=note: print w/ replies, xml ) Need Help??

in reply to Fast Identification Of String Difference

Since you appear to have nucleic acid strings, perhaps you could leverage the minimal alphabet used to pre-compute differences. Assuming each position can only be represented by 5 characters (A, C, T, G and -) then taking 4 characters at a time, there can only be 625 possible strings. Comparing two such sub strings means there are only 390,625 possible pairs - surely small enough to fit into memory.

I do realize that you have to keep track of which "quad" you are comparing so that your position comes out right but this is hardly significant. Even comparing 5 characters at a time is less than 10 million parings so depending on your memory constraints you may not be limited to quads. Of course, if your strings can have more than 5 characters at each position you will have to do fewer at a time.

If you need an implementation of this, let me know but I think it should be fairly obvious.

Cheers - L~R

Comment on Re: Fast Identification Of String Difference

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://882670]
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others pondering the Monastery: (5)
As of 2015-11-29 12:13 GMT
Find Nodes?
    Voting Booth?

    What would be the most significant thing to happen if a rope (or wire) tied the Earth and the Moon together?

    Results (750 votes), past polls