Beefy Boxes and Bandwidth Generously Provided by pair Networks
Keep It Simple, Stupid

Re: Fast Identification Of String Difference

by Limbic~Region (Chancellor)
on Jan 17, 2011 at 14:41 UTC ( #882670=note: print w/replies, xml ) Need Help??

in reply to Fast Identification Of String Difference

Since you appear to have nucleic acid strings, perhaps you could leverage the minimal alphabet used to pre-compute differences. Assuming each position can only be represented by 5 characters (A, C, T, G and -) then taking 4 characters at a time, there can only be 625 possible strings. Comparing two such sub strings means there are only 390,625 possible pairs - surely small enough to fit into memory.

I do realize that you have to keep track of which "quad" you are comparing so that your position comes out right but this is hardly significant. Even comparing 5 characters at a time is less than 10 million parings so depending on your memory constraints you may not be limited to quads. Of course, if your strings can have more than 5 characters at each position you will have to do fewer at a time.

If you need an implementation of this, let me know but I think it should be fairly obvious.

Cheers - L~R

  • Comment on Re: Fast Identification Of String Difference

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://882670]
and all is quiet...

How do I use this? | Other CB clients
Other Users?
Others cooling their heels in the Monastery: (3)
As of 2017-05-25 04:32 GMT
Find Nodes?
    Voting Booth?