Beefy Boxes and Bandwidth Generously Provided by pair Networks
good chemistry is complicated,
and a little bit messy -LW

Re: Suggestions requested: module to standardize postal address components?

by Corion (Pope)
on Jun 30, 2010 at 07:55 UTC ( #847272=note: print w/replies, xml ) Need Help??

in reply to Suggestions requested: module to standardize postal address components?

I think some way of canonicalization is nice. But the meat of canonicalization is the data of replacements to make and the list of exceptions to these. I'm not aware of any set of rules, be they US-centric or not, and I'm also not aware of any (database) schema to manage addresses at all.

Maybe looking at FOAF might provide such a schema. Maybe you can also structure your canonicalization rules in a general way as pairs (key,replacement) and have a generic driver that looks at each key and does the replacement:

sub canonicalize { my ($rules, $element) = @_; for my $rule (@$rules) { my ($key,$action) = @$rule; if (exists $element->{ $key }) { if (ref $action eq 'CODE') { $action->( $element->{ $key } ); } else { warn "Unknown rule type '$action' for element '$key'"; }; }; }; }; my $en_us = [ [ 'address' => sub { $_[0] =~ s/\bAvenue\b/Ave/ } ], [ 'address' => sub { $_[0] =~ s/\bNorthwest$/NW/ } ], ... ]; canonicalize($en_us, \%address);

Replies are listed 'Best First'.
Re^2: Suggestions requested: module to standardize postal address components?
by atcroft (Abbot) on Jun 30, 2010 at 08:28 UTC

    I really appreciate the feedback, Corion. Thank you.

    Actually, the US Postal Service has a list of standard abbreviations for use with postal addressing, at least in the US. What I did was to create a set of regexes for those, so as they set now they just consist of the regexes and the common abbreviations they refer to, in a form I could generate the tests from. I haven't put them into a more usable form yet, due in part to a lack of to-its.

    I'll take a look at the FOAF project link you indicated, to see if there seems to be anything there that might be of use, as well as look over your recommendations when I have neurons firing a little more in tune.

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://847272]
and all is quiet...

How do I use this? | Other CB clients
Other Users?
Others lurking in the Monastery: (5)
As of 2018-06-19 01:19 GMT
Find Nodes?
    Voting Booth?
    Should cpanminus be part of the standard Perl release?

    Results (111 votes). Check out past polls.