in reply to list of unique strings, also eliminating matching substrings
I have hundreds of sets of strings, each containing about 100,000 strings, and each string is about 300 characters long.
A few questions:
- You want to eliminate the dups in each of the files? Or across all of the files?
- What (roughly) are the maximum and minimum lengths of the strings?
- Do they consist soley of ACGT or are the other characters (X N etc.)?
Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.