patrickrock,
I am surprised no one has mentioned
Lingua::EN::AddressParse yet. It will not be a 100% solution. As indicated elsewhere in this thread, commercial products such as
Group 1 are really good at this. Using the module though, you can probably reduce the amount of work that needs to be done by hand to about 10%. You could use something like
Geo::Coder::US to help determine if the address you have is actually valid. A bit more research on
CPAN might turn up even more goodies.
On further review of this module, it appears the Parse::RecDescent grammar for US addresses could use some TLC. The author, Kim Ryan, appears to be from down under and complex US addresses don't seem to get parsed correctly. I bet someone here can improve it though ;-)