|more useful options|
Re: Finding Nearly Identical Setsby LanX (Cardinal)
|on Sep 28, 2016 at 19:56 UTC||Need Help??|
IIRC these structures are called multisets because some elements are repeated in one of your examples.
If I understand your requirements correctly, you can use your approach in a pragmatic way, because any "neighboring" multi sets must have at least 8 digits in common.
At the end you'll only need 9 hash look ups to drastically narrow down potential candidates.
NB: That's a pragmatic approach, a detailed survey might show more efficient algorithms.
PS: this problem reminds me of hamming distance of error correcting codes, but I doubt you can easily apply this here.
I just realized that you already sketched that approach in Re^2: Finding Nearly Identical Sets . Not sure why you say it's ugly, cause a HoH should be quite fast, and you'd need to check anyway, if your input is equidistant to multiple neighbors.