A different approach, which worked better for me, was to make lists of all the substrings of length n in the source string. I called these n-tuples. I compared the percentage overlap between the n-tuple sets for each name in one list to the n-tuples for each word in the other list. The best value for the length n of the tuples was three or four.
in reply to Approximate matching of company names
in thread Some kind of fuzzy logic.
Toma, could you give me an example of what you mean by this paragraph? I don't want you to go to the trouble of code examples, I mean an example using text so I can better understand what you mean.