A different approach, which worked better for me, was to make lists of all the substrings of length n in the source string. I called these n-tuples. I compared the percentage overlap between the n-tuple sets for each name in one list to the n-tuples for each word in the other list. The best value for the length n of the tuples was three or four.
, could you give me an example of what you mean by this paragraph? I don't want you to go to the trouble of code examples, I mean an example using text so I can better understand what you mean.