![]() |
|
XP is just a number | |
PerlMonks |
Re: Parse mailing addresses with a regexby BrowserUk (Pope) |
on Jun 23, 2003 at 14:35 UTC ( #268182=note: print w/replies, xml ) | Need Help?? |
Personally, I think I'd use split with capturing brackets so the delimiters are not discarded to break this into chunks first. By using /([\d-]+)/ as the delimiter, it breaks the line up between the numbers (The '-' is to keep the telephone number in one chuck.
As you can see the only chunk that need much further processing is then the address ($chunk[4]) which only requires the last two words to be broken off to give you city and state. At least as far as your examples go. How you would recognise City names with more than one word (eg.Salt Lake City) is up to you. Probably the best way would be to grab a dictionary of town/city names from somewhere, put them in a hash, strip the state and look up the last word, the last two words, the last three words until you get a match. Subdividing the hash by the state first would further increase your reliability. Examine what is said, not who speaks.
"Efficiency is intelligent laziness." -David Dunham"When I'm working on a problem, I never think about beauty. I think only how to solve the problem. But when I have finished, if the solution is not beautiful, I know it is wrong." -Richard Buckminster Fuller
In Section
Seekers of Perl Wisdom
|
|