Beefy Boxes and Bandwidth Generously Provided by pair Networks
go ahead... be a heretic
 
PerlMonks  

International Addresses

by krazken (Scribe)
on Mar 06, 2002 at 15:29 UTC ( [id://149711]=perlquestion: print w/replies, xml ) Need Help??

krazken has asked for the wisdom of the Perl Monks concerning the following question:

I am trying to code a program that will attempt to standardize a given address and components based on the country provided. I figured I would ask out here first to see if anyone has developed something already for their home country. (anything but the USA). For example... a record comes in, and the country is Canada, so I know I have to use special Canadian address logic to decide whether it is a French based address or an English based address and then try to standardize it from there as well as validate that the postal code matches up with the province that they say they are from. Any idea on even how to approach this? I have to do this for as many countries as possible, so any help or tips/tricks would be very appreciated.

Replies are listed 'Best First'.
Re: International Addresses
by shotgunefx (Parson) on Mar 06, 2002 at 18:36 UTC
    I think that's quite an ambitious project.
    Validating can be a huge pain mainly in acquiring the appropriate data (zipcodes,postal codes, etc) for all places in the world.
    I think you will end up validating it like an email. (It's valid in form but no way to know until you send it!)

    How will you be getting this information? You might be better coming up with the elements of addresses and working those elements into "definitions". I don't know if address requirements vary within different regions of any countries, but working backwards like this, you could 1. use it to validate information as being valid in form and 2. use these definitions in other programs to acquire the appropriate data.
    You may or may not find the following modules helpful
    Data::Address::Standardize
    Scrape::USPS::ZipLookup
    ISO 3166 Country Codes

    Good luck. I'd be interested in seeing what you come up with.

    -Lee

    "To be civilized is to deny one's nature."
      Well I am starting with Canada. I am creating lookup structures that have stuff like valid words in a french address, valid words in an english address, what the valid abbreviations for the street type are (e.g. street=st abbey=abbey etc.) Once I get all of that, the majority of this will come from information compiled of the web just by searching for postal information on each individual country. Very manual I know. Oh yeah...to throw another wrench into it, I have to be able to handle multiple character sets as well, so unicode is a must.
        Are you trying to parse this out of freeform text or do you know what data is what.

        As an aside, I posted a module here for looking up ISO Country codes. Don't know if you will find it useful or not. I have one for US states to that I will post later.

        -Lee

        "To be civilized is to deny one's nature."
        I am not sure what you mean by "valid words"? Would this mean that an address must contain one of the special words to be ok, such as "street" or "st", "lane", "road" or something in the address field?

        I'd say that is an impossible task, unless some countries actually have such strict policies for what is a valid address. Speaking for Sweden, for one thing, we have lots of addresses containing "gata", which means "street" for instance, but lots of addresses don't - and some addresses are just the name of a village, or something smaller than that, with or without a number after it. Yet other addresses are something that would translate to "Mailbox XXX", which is not the same thing as a PO box (we have those too), etc... frankly, I can't see any other match to our addresses than /.+/.

        Either I misunderstood what you mean, or I think it will be impossible to create these rules - unless you would do as some e-commerce do, check addresses against where people live according to central government registers. And that was clearly not your goal... :)


        You have moved into a dark place.
        It is pitch black. You are likely to be eaten by a grue.
Re: International Addresses
by JPaul (Hermit) on Mar 06, 2002 at 18:34 UTC
    Greetings,
    For what its worth, in New Zealand an address would look something like the following:
    66a Tiotio Road,
    Seatoun,
    Wellington 6003,
    New Zealand

    There are no states, and the area code (6003) isn't even required. The suburb (Seatoun) is also not required, but its nearly always put on the address for clarification. The NZ postal system is efficient, and they actually have people with a brain working behind the desk able to work out from bad addresses where a letter is supposed to go.
    Just a bit of civic pride.

    JP,
    -- Alexander Widdlemouse undid his bellybutton and his bum dropped off --

Re: International Addresses
by talexb (Chancellor) on Mar 06, 2002 at 18:27 UTC
    I'm note sure what differences are required for English and French addresses unless you're talking about "Street/Rue" titles. For province names you can use the standard North American abbreviations (BC, AB, ON, QC, etc).

    After that the postal code is pretty simple .. each province has one or more letters assigned to it. Ontario has five (I think) .. K, L, M, N and P. Québec has at least H and J. It's probably easier to code that relationsip in (shudder) JavaScript. As soon as a province is selected, you can derive the range of available first letters for the postal code from that.

    --t. alex

    "There was supposed to be an earth-shattering kaboom!" --Marvin the Martian

      The only problem with the canadian postal codes and using the first byter to determine the province is that both the Northwest Territories and Nunavut both use X as their identifier. I will have to do this on a country by country basis that way I can introduce custom code per country. I have a book that contains the address formats of 193 countries, and I am using that as a starting place... I know this is massive, but there is nothing out there that does anything like this, and anyone who does any type of database work with international data will have a need for this.
        I have had a need for something like what you are doing. I gave up and just used 4 lines of free format text, except in the case of US addresses, for which I had formatting rules. What book has the address formats of 193 countries? I might buy that one.


        The more I learn, the less I think I know.

Re: International Addresses
by theguvnor (Chaplain) on Mar 07, 2002 at 01:24 UTC
    It appears that you are aware of this but I thought I'd just point out (for other monks' potential interest) that ALL Canadian postal codes follow the format XnX nXn where X is an alpha and n is a numeric character. And you are correct that the first digit in the code indicates the province (though whether that will truly indicate whether you want to use English or French is unclear as other parts of the country besides Quebec speak French.

    To get the whole scoop on Canadian postal code system, you can get an entire PDF on the subject from Canada Post here. I hope this helps!

    ..Guv

    PS update: wmono wants me to clarify that while the majority of Quebecers speak French, there is a sizeable English-speaking minority also.

Re: International Addresses
by Stegalex (Chaplain) on Mar 06, 2002 at 19:25 UTC
    Look at Locale::Country and Locale::Subcountry on the CPAN. These modules encapsulate the ISO-3166 standard codes for countries and states, etc.

    I like chicken.
      Those seem pretty good to use for lookup table type stuff on a country by country basis, but do you know of anything that does actual address parsing for a particular country? Like that big nasty regex for a vaild email address, only for mailing addresses instead.
        Sorry, I don't know of such a module, and may I say that if you are looking for something that will always correctly parse all addresses for a particular country, you are probably relieving yourself while facing into the breeze. That being said, I think that what you are doing is not very different from a practice called "address correction" in which the postal service corrects the wording of your address to conform to country standards. In the U.S., the software leader in this space is Group One. Also, the postal service should have well publicized regs for this. I wish you luck as this sounds like a thankless task! I like chicken.
Re: International Addresses
by gav^ (Curate) on Mar 07, 2002 at 16:08 UTC
    I think it is pretty impossible, I remember having quite a few problems years back when I lived in the UK and a lot of american websites assumed that all 'states' were 2 characters. I found it quite frustrating and ended up putting SU (for Surrey). UK addresses are pretty complex, in theory you could write "2 GU185RS" on an envelope and it would get to my old address, "2 Lovells Close, Lightwater, Surrey, GU18 5RS", which has an optional county. I guess you'd have a lot more luck with american/canadian addresses as they are more 'standard'.

    The main thing is to stop user error, and something like a page/form that said 'is this your address?' with it all broken down helps.

    One of our clients had a problem where people kept entering the city twice, once in the city field and once in the postal/zip field. We found out that in certain versions of netscape the table wrapped causing 'Zip:' to be on the end of one line and the textbox at the start of the next, underneath the city box. We've got no idea why people decided this meant they should enter their city twice though...

    gav^

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://149711]
Approved by root
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others chanting in the Monastery: (6)
As of 2024-05-21 06:31 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found