Beefy Boxes and Bandwidth Generously Provided by pair Networks
Keep It Simple, Stupid
 
PerlMonks  

Re: Finding Nearly Identical Sets

by shmem (Chancellor)
on Sep 28, 2016 at 21:21 UTC ( #1172867=note: print w/replies, xml ) Need Help??


in reply to Finding Nearly Identical Sets

First task to do is to prepare yourself a big bowl of lime blossom tea (tilia). Let it brew for 10 minutes, take care that it doesn't become cold - a thermos flask would keep it hot. Add plenty of honey, a pinch of salt, and lemon juice. Prepare your bed with some towels on the bed sheet, have more towels to cover yourself; get into bed, cover yourself with towels then blanket(s), and drink the tea really really hot, just below the temperature which would hurt you. Keep yourself covered snug and warm, close your eyes and sweat the crap out. --

Next, the identifier thing. I have the gut feeling that a solution involving binary trees and partial match could provide a nice solution. Whilst you sweat along, I'm going to evolve that thought as I am typing.

Since you say that you accept Any change to the order (1,2,3,4,5,6,7,8,9 instead of 9,8,7,6,5,4,3,2,1) you may as well store keys with an exact length as a split/sort/concatenate key. Then again, you could also store the reversed sorted key in the B-Tree database.

If you do a lookup of the split/sort/concatenate candidate in the database, you could get a full match - done. If you get a partial match, look up the reversed sorted candidate. The difference between the sum of both partial key lengths and the required key length should be 1, no matter whether insertion, deletion or transformation of 1 digit is the cause of difference. If it is any longer, more than 1 transformation has been done to the expected key.

See DB_File's BTREE section, and maybe IP prefixes: match IP against "best" network.

update

Ahem. The difference between the sum of both partial match keys to the expected length should be 1 for insertion/deletion, and 2 for transformation, I guess...not. - I'm too tired right now to code the algorithm...

perl -le'print map{pack c,($-++?1:13)+ord}split//,ESEL'

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://1172867]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others romping around the Monastery: (3)
As of 2021-03-01 02:02 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found

    Notices?