http://www.perlmonks.org?node_id=60745
Description: I came up with a couple of "interesting" ways to get the U.S. state names into an array or hash.
my %States= do{ my(%p,@p); @p{qw(New West North South Rhode)}=(1)x5;
    map { @p ? pop(@p)." $_" : $p{$_} ? ()x(push @p,$_) : $_ } qw{
        AK Alaska           LA Louisiana        OH Ohio
        AL Alabama          MA Massachusetts    OK Oklahoma
        AR Arkansas         MD Maryland         OR Oregon
        AZ Arizona          ME Maine            PA Pennsylvania
        CA California       MI Michigan         RI Rhode Island
        CO Colorado         MN Minnesota        SC South Carolina
        CT Connecticut      MO Missouri         SD South Dakota
        DE Delaware         MS Mississippi      TN Tennessee
        FL Florida          MT Montana          TX Texas
        GA Georgia          NC North Carolina   UT Utah
        HI Hawaii           ND North Dakota     VA Virginia
        IA Iowa             NE Nebraska         VT Vermont
        ID Idaho            NH New Hampshire    WA Washington
        IL Illinois         NJ New Jersey       WI Wisconsin
        IN Indiana          NM New Mexico       WV West Virginia
        KS Kansas           NV Nevada           WY Wyoming
        KY Kentucky         NY New York
    }
};
( undef, %States )= split /\s+(?=\w\w\s)|(?<=\s\w\w)/, q{
        AK Alaska           LA Louisiana        OH Ohio
        AL Alabama          MA Massachusetts    OK Oklahoma
        AR Arkansas         MD Maryland         OR Oregon
        AZ Arizona          ME Maine            PA Pennsylvania
        CA California       MI Michigan         RI Rhode Island
        CO Colorado         MN Minnesota        SC South Carolina
        CT Connecticut      MO Missouri         SD South Dakota
        DE Delaware         MS Mississippi      TN Tennessee
        FL Florida          MT Montana          TX Texas
        GA Georgia          NC North Carolina   UT Utah
        HI Hawaii           ND North Dakota     VA Virginia
        IA Iowa             NE Nebraska         VT Vermont
        ID Idaho            NH New Hampshire    WA Washington
        IL Illinois         NJ New Jersey       WI Wisconsin
        IN Indiana          NM New Mexico       WV West Virginia
        KS Kansas           NV Nevada           WY Wyoming
        KY Kentucky         NY New York};
Replies are listed 'Best First'.
Re: U.S. State Names
by merlyn (Sage) on Feb 25, 2001 at 20:09 UTC
    This might be a bit more maintainable:
    my %States = q{ AK Alaska LA Louisiana OH Ohio AL Alabama MA Massachusetts OK Oklahoma AR Arkansas MD Maryland OR Oregon AZ Arizona ME Maine PA Pennsylvania CA California MI Michigan RI Rhode Island CO Colorado MN Minnesota SC South Carolina CT Connecticut MO Missouri SD South Dakota DE Delaware MS Mississippi TN Tennessee FL Florida MT Montana TX Texas GA Georgia NC North Carolina UT Utah HI Hawaii ND North Dakota VA Virginia IA Iowa NE Nebraska VT Vermont ID Idaho NH New Hampshire WA Washington IL Illinois NJ New Jersey WI Wisconsin IN Indiana NM New Mexico WV West Virginia KS Kansas NV Nevada WY Wyoming KY Kentucky NY New York } =~ /\G\s*(\w\w)\s+(\w+(?:\s\w+)?)/g;
    Once I discovered \G, "capturing splits" never really do much for me because the logic is usually so reversed.

    -- Randal L. Schwartz, Perl hacker

      As it happens neither the \G nor the \s* are required in this case.

      my %States = q{ AK Alaska LA Louisiana OH Ohio AL Alabama MA Massachusetts OK Oklahoma AR Arkansas MD Maryland OR Oregon AZ Arizona ME Maine PA Pennsylvania CA California MI Michigan RI Rhode Island CO Colorado MN Minnesota SC South Carolina CT Connecticut MO Missouri SD South Dakota DE Delaware MS Mississippi TN Tennessee FL Florida MT Montana TX Texas GA Georgia NC North Carolina UT Utah HI Hawaii ND North Dakota VA Virginia IA Iowa NE Nebraska VT Vermont ID Idaho NH New Hampshire WA Washington IL Illinois NJ New Jersey WI Wisconsin IN Indiana NM New Mexico WV West Virginia KS Kansas NV Nevada WY Wyoming KY Kentucky NY New York US United States Of Am +erica } =~ /(\w\w)\s+(\w+(?:\s\w+)*)/g; print "$_ => $States{$_}\n" for keys %States;

      cheers

      tachyon

      s&&rsenoyhcatreve&&&s&n.+t&"$'$`$\"$\&"&ee&&y&srve&&d&&print

Re: U.S. State Names
by unixwzrd (Beadle) on Feb 25, 2001 at 22:21 UTC
    Very nice, but you left out the fine folks in DC... Though they break your regex :)

    For completeness, I offer these which are "Official" FIPS State codes. I hate to admit, I'm way to familiar with the government's GIS TIGER/Line database.

    This is also why I'm way too familiar with having to do with fixed length records and mainframe data extracts... Re: Converting fixed record length files to pipe delimited I still have nightmares about it.

    It does contain a lot, and when I say a lot it's huge, of interesting data in it, like Lat's and Long's for just about *everything* anyone could think to map down to street intersections, and street addresses.

    District of Columbia: DC District of Columbia Puerto Rico and Outlying Areas: AS American Samoa GU Guam MP Northern Mariana Islands PR Puerto Rico UM U.S. Minor Outlying Islands VI Virgin Islands of the U.S. Freely Associated States: FM Federated States of Micronesia MH Marshall Islands PW Palau

    Mike - mps@discomsys.com

    "The two most common elements in the universe are hydrogen... and stupidity."
    Harlan Ellison

(dkubb) Re: (2) U.S. State Names
by dkubb (Deacon) on Feb 26, 2001 at 08:49 UTC

    An alternate way to get the United States' state names is by using the CPAN module Locale::SubCountry:

    #!/usr/bin/perl -w use strict; use Locale::SubCountry; use Data::Dumper qw(Dumper); #Create an object representing a certain country my $subcountry = Locale::SubCountry->new('US'); #Query to find this country's subcountries. my %states = $subcountry->code_full_name_hash; #Prints the name/code pairs for every US state print Dumper(\%states); __END__

    Besides the obvious benefit of keeping these details in a central location, this module will tell you the state/province/county/etc of all the countries in the ISO 3166-2 sub country code list.

    Updated: I changed the code example to use the method code_full_name_hash which returns a hash where the state code is the key and the value is the state's full name. This is so my example returns the same as tye's first example.

    The method name was self describing, if you'd like the reverse (like I had before), use the full_name_code method.

      Awesome point dkubb! I would never have thought of using that module to do that same thing tye was doing earlier on his own. This saves us a ton of time! Thanks for bringing this module up and letting us know about it! :-)

      bladx ~ ¡mucho veces yo tengo preguntas!
Re: U.S. State Names
by aardvark (Pilgrim) on Feb 26, 2001 at 06:31 UTC
    Thanks for a great snippet tye!!
    I must admit that it took me a while to get my brain around merlyn's regex. But after breaking it down it made perfect sense. unixwizard point out that the code wouldn't work on the 'other' US states, but I think that can be modified by changing the (?:\s\w+)? bit to (?:\s\w+)*
    Here is merlyn's regex broken down, with comments:
    \G # when progressively matching a string # with the 'g' flag # you can use the \G anchor to 'hold' the postition # just after the previous match # helps regex remember where it left off # allows you to go through a list efficiently # without using split or looping # Mastering Regular Expessions (p 236 - 240) \s* # matches zero or more spaces that # may come before before a name-value pair (\w\w) # match two word characters (alphanumeric plus '_') # parentheses assign matched letters to $1 # this is the state abbreviation \s+ # match one or more spaces between name-value pair (\w+ # match one or more word characters (?: # ?: allows for cluster-only parentheses, # no capturing and doesn't assign to $3 \s\w+ # match one space then one or many word characters )? # match zero or one of these clusters # allows match of state names with mulitple words # ie New York, West Viginia # does not match States with three words, # like 'Northern Mariana Island' # change trailing ? to * to match those '(?:\s\w+)*' ) # assigns state name to $2 /gx; # end of regex # g flag for global search # x flag to allow whitespace in regex # might also want to use c flag # c flag causes the match position to be retained # following an unsuccesful match # see: Effective Perl Programming (p.63) # # the complete regex looks like this: # # /\G\s*(\w\w)\s+(\w+(?:\s\w+)?)/g; #
    Here is the Benchmark of the three routines
    Benchmark: timing 1000 iterations of merlyn, tye_1, tye_2... merlyn: 0 secs ( 0.54 usr + 0.00 sys = 0.54 CPU) @ 1851.85/s (n=1000) tye_1: 1 secs ( 0.65 usr + 0.00 sys = 0.65 CPU) @ 1538.46/s (n=1000) tye_2: 2 secs ( 1.58 usr + 0.00 sys = 1.58 CPU) @ 632.91/s (n=1000)
    Also here is a link to FIPS and ISO 3166 country codes in case anybody wants to apply this snippet to countries.
    http://www.cia.gov/cia/publications/factbook/docs/app-f.html

    Get Strong Together!!
Re (tilly) 1: U.S. State Names
by tilly (Archbishop) on Feb 26, 2001 at 08:28 UTC
    You can always choose a slightly different delimiter...
    %state_name = split /\s\s+/, q(AK Alaska LA Louisiana OH Ohio AL Alabama MA Massachusetts OK Oklahoma AR Arkansas MD Maryland OR Oregon AZ Arizona ME Maine PA Pennsylvania CA California MI Michigan RI Rhode Island CO Colorado MN Minnesota SC South Carolina CT Connecticut MO Missouri SD South Dakota DE Delaware MS Mississippi TN Tennessee FL Florida MT Montana TX Texas GA Georgia NC North Carolina UT Utah HI Hawaii ND North Dakota VA Virginia IA Iowa NE Nebraska VT Vermont ID Idaho NH New Hampshire WA Washington IL Illinois NJ New Jersey WI Wisconsin IN Indiana NM New Mexico WV West Virginia KS Kansas NV Nevada WY Wyoming KY Kentucky NY New York);
        Right down to the number of characters.

        I find \s\s+ slightly more obvious than \s{2,} but that is entirely a matter of taste.

        *shrug*

        (Something tells me that there is a remote chance that Ilya treats the {2,} construct as more complex, falls into a general case, and might not always optimize it as well. If you ask him, please tell me the answer.)