Beefy Boxes and Bandwidth Generously Provided by pair Networks
Problems? Is your data what you think it is?
 
PerlMonks  

Re^2: Fuzzy matching of text strings

by buttroast (Scribe)
on Dec 14, 2005 at 19:22 UTC ( [id://516740]=note: print w/replies, xml ) Need Help??


in reply to Re: Fuzzy matching of text strings
in thread Fuzzy matching of text strings

Soundex is a great tool, but in this case it is not doing anything. The reason the first four descriptions in your sample return the same soundex code is because they only processed the "Promess" portion of each record.

Basically:

1. Grab the first letter:

String: Promessa H...
Soundex: P

2. Remove all vowels in remaining string:

String: rmssH
Soundex: P

3. Condense duplicate letters:


String: rmsH
Soundex: P

4. Assign 3 digits from l-r based on following key:

1. b,p,f,v
2. c,s,k,g,i,q,x,z
3. d,t
4. l
5. m,n
6. r

String: rmsH
Soundex: P6 (6 is for r)

String: msH
Soundex: P65 (5 is for m)

String: sH
Soundex: P652 (2 is for s)

DONE AT 3 DIGITS!!! GO NO FURTHER.

If there are consecutive characters from the same group, such as in the name "Duck", (c and k are both in group 2), the resulting soundex would be D200 (zeros are added to pad right if we run out of letters to change to numbers).

In summary, soundex is not appropriate for longer strings comparison. If you use it, the following would all be grouped as P652:

Promessa National Bank
Promessing Fertilizer Company
Promessa High Spirits
Promessing With Me

Hope this clears up Soundex for everyone.

Thanks buttroast

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://516740]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others making s'mores by the fire in the courtyard of the Monastery: (6)
As of 2024-04-19 16:43 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found