Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl Monk, Perl Meditation
 
PerlMonks  

Re: Find duplicate based on specific fields while allowing 2 mismatch

by hdb (Monsignor)
on Aug 28, 2017 at 07:59 UTC ( [id://1198136]=note: print w/replies, xml ) Need Help??


in reply to Find duplicate based on specific fields while allowing 2 mismatch

Can you please clarify your second requirement? The relation "two mismatches" is not transitive, e.g. there are two mismatches between AAAA and AAGG and two mismatches between AAGG and TTGG but four mismatches between AAAA and TTGG. Would you consider all three to belong to one cluster?

Replies are listed 'Best First'.
Re^2: Find duplicate based on specific fields while allowing 2 mismatch
by amitgsir (Novice) on Aug 28, 2017 at 08:21 UTC
    I think, I need to take the first entry as the reference and allow the two possible mismatch using first line's UMI tag to make one cluster. Remaning lines at the same start positions, can be looped again similarly. So, if the first line have AAAA then AAGG or TTAA, etc can be merged into single cluster, But, TTGG will make separate cluster. I have edited the question for the same! Amit
      "I have edited the question for the same!"

      Do not just edit your question without showing very clearly what you've changed: it often means that answers to your original post no longer make sense.

      You can do this in a number of ways:

      • For a small change within a sentence: <del>old text</del><ins>new text</ins>.
      • To remove a large amount of text, code, or data: add an Update comment explaining what you're doing, then <strike>... part to remove ...</strike>.
      • To add new content: use an Update comment explaining what you're doing, then add the new content.
      • To change a large amount of text, code, or data: add an Update comment explaining what you're doing, then <strike>...</strike> (as per the lastearlier point); then add replacement text, code, or data.
      • If you're striking out significant sections of your post, and the result ends up difficult to read, consider putting those parts inside <spoiler>...</spoiler> or <readmore>...</readmore> tags; however, ensure the Update comment is visible.

      Never just delete some, or all, of your post!

      See "How do I change/delete my post?" for details and further discussion.

      — Ken

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://1198136]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others having a coffee break in the Monastery: (4)
As of 2024-03-29 15:59 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found