"be consistent" | |
PerlMonks |
comment on |
( [id://3333]=superdoc: print w/replies, xml ) | Need Help?? |
I am working on an app that tries to parse text that may be very poorly formatted. The text is mainly public record data, and was mostly entered into various systems many years ago, by people who never thought about algorithmic parsing. They jsut entered the text into the field with all sorts of goofy abbreviations.
My algorithm is trying to clean up some of the inconsistencies, for instance the word "Trust" may be abbreviated as "Trst", "Tst", "Tru", etc. So, im thinking of using String::Approx to match potential abbreviated words against a pre-determined list. Would this be wise, or does anyone know of a module more specific to my needs? thanks In reply to abbreviation checking by shemp
|
|