Beefy Boxes and Bandwidth Generously Provided by pair Networks Frank
more useful options
 
PerlMonks  

Golf challange: match U.S. State names

by John M. Dlugosz (Monsignor)
on Jun 06, 2001 at 02:27 UTC ( #86047=perlmeditation: print w/ replies, xml ) Need Help??

Consider an expression that evaluates as true if and only if $_ contains a 2-character U.S. state abbreviation. One way would be to enumerate them: /^(AK|AL|AR ... )$/. A shorter way might be to fold out the first letter. But, what about cool tricks? Perhaps there is a truely short awsome and cool solution. So I toss it into the ring for the perlwits here.

—John

Comment on Golf challange: match U.S. State names
Download Code
Re: Golf challange: match U.S. State names
by Brovnik (Hermit) on Jun 06, 2001 at 02:48 UTC
    How many states are you including ?

    E.g. Are you going to count dependent territories as per the USPS codes, or just 52 states.
    --
    Brovnik

      or just 52 states

      uh.. fifty-howmany? last i heard, the U.S. only had 50 states and one district.

      update: re: tilly's list:
      uh... 'KT'? since when is Kentucky Tavern whiskey its own state?

      .
      I think it is the principle of the thing that counts. How about:
      AK AL AR AZ CA CO CT DC DE FL GA HI IA ID IL IN KS KT KY LA MA MD ME MI MN MO MS MT NC ND NE NH NJ NM NV NY OH OK OR PA RI SC SD TN TX UT VA VT WA WI WV WY
      And the obvious answer is, of course,
      sub state { grep$_[0]eq$_,'AKALARAZCACOCTDCDEFLGAHIIAIDILINKSKTKYLAMAMDMEMIMNMOMSM +TNCNDNENHNJNMNVNYOHOKORPARISCSDTNTXUTVAVTWAWIWVWY'=~/../g }
      at 127 characters. I don't see an obviously better approach...
        124 chars, using that set :)
        sub state { pop=~/^(A[KLRZ]|C[AOT]|D[CE]|FL|GA|HI|I[ADLN]|K[STY]|LA|M[ADEINOST]| +N[CDEHJMVY]|O[HKR]|PA|RI|S[CD]|T[NX]|UT|V[AT]|W[AIVY])$/ }
           MeowChow                                   
                       s aamecha.s a..a\u$&owag.print
        But the match isn't just on the names, but also on the abutment of names. I think you will get some false positives Eg. 'ZC' and 'OC' etc...

        Have fun,
        Carl Forde

        Hmm, that's indeed 52 names. 50 states plus DC plus what?
      For the canonical list, I'd suggest looking at what the USPS has to say on the issue of State Abberivation Codes.

      The lists being used don't include any of our possestions or the overseas military 'states'.

      --SparkeyG

Re: Golf challange: match U.S. State names
by runrig (Abbot) on Jun 06, 2001 at 03:21 UTC
    Not just a golf solution, its a golf solution generator:

    Update: 'Borrowed' tilly's list of states. (my result is a 113 character string - though I too am now wondering where 'KT' is ... I'm told its a brand of whiskey, so I'll keep it in - besides, it is now the 'official' list for this challenge :-)

    Another update: Added dws's fix.

    #!/usr/local/bin/perl -l -w use strict; my @states=qw( AK AL AR AZ CA CO CT DC DE FL GA HI IA ID IL IN KS KT KY LA MA MD ME MI MN MO MS MT NC ND NE NH NJ NM NV NY OH OK OR PA RI SC SD TN TX UT VA VT WA WI WV WY); my %states; @states{@states} = (); my @results; my $is_done; while (%states) { my ($letter, $is_first); ($letter, $is_first, $is_done) = get_next_letter(\%states); push(@results, keys %states), last if $is_done; my $pos = $is_first ? 0 : 1; my @states = grep { substr($_,$pos,1) eq $letter } keys %states; my $chr_class = '[' . join('', map { substr($_, 1-$pos, 1) } @states) . ']'; push @results, $is_first ? $letter.$chr_class : $chr_class.$letter; delete @states{@states}; } print '/^(',join("|", @results),')$/'; sub get_next_letter { my $states = shift; my (%first, %second, $is_done); for (keys %$states) { my ($first, $second) = split ''; $first{$first}++; $second{$second}++; } my ($first) = sort { $first{$b} <=> $first{$a} } keys %first; my ($second) = sort { $second{$b} <=> $second{$a} } keys %second; $is_done = 1 if $first{$first} == 1 and $second{$second} == 1; ($first{$first} > $second{$second}) ? ($first, 1, $is_done) : ($second, 0, $is_done); } # 113 characters! # Add 5 characters ('pop=~' to the beginning) if you want # it to work on @_ instead of $_ /^([WLPCVGIM]A|N[CDEHJMVY]|M[DEINOST]|[VKUC]T|A[ZKLR]|[WHR]I|O[HKR]|I[ +LND]|[SD]C|[KW]Y|T[XN]|KS|SD|DE|FL|WV|CO)$/
Re: Golf challange: match U.S. State names
by merlyn (Sage) on Jun 06, 2001 at 06:57 UTC
    118 chars:
    sub state { 1&index" AKALCAARAZCOCTDCDEFLGAHIIAIDILINKSKTKYLAMAMDMEMIMNNCMOMSMTOHN +DNENHNJNMPANVNYOKORRISCSDTNTXUTVAVTWAWIWVWY",pop }
    And it's 117 chars if you replace pop with $_, as the original rules ask.

    The character string was constructed to ensure that a valid state would never match a non-valid boundary. It was easier than I thought:

    #!/usr/bin/perl use strict; $|++; my @states = qw( AK AL AR AZ CA CO CT DC DE FL GA HI IA ID IL IN KS KT KY LA MA MD ME MI MN MO MS MT NC ND NE NH NJ NM NV NY OH OK OR PA RI SC SD TN TX UT VA VT WA WI WV WY ); my %states = map { $_ => 1 } @states; try("", \@states); sub try { print "try: $_[0] @{$_[1]}\n"; my $prefix = shift; my @later = @{+shift}; unless (@later) { print "WINNER: $prefix\n"; exit; } for my $i (0..$#later) { my $prefix2 = "$prefix$later[$i]"; if (length $prefix2 < 3 or not $states{substr($prefix2, -3, 2)}) { ## worth recursing try($prefix2, [@later[0..$i-1, $i+1..$#later]]); } } }

    -- Randal L. Schwartz, Perl hacker


      Reason: (MeowChow) replicant nodes must die!

      For more information on this node visit: this

      This has two problems: on a non-match, index will return -1, which, ANDed with 1, returns 1, so virtually any string will match. This sub will also match strings like "ALC" or just "A".
         MeowChow                                   
                     s aamecha.s a..a\u$&owag.print
Re: Golf challange: match U.S. State names
by dws (Chancellor) on Jun 06, 2001 at 09:39 UTC
    A hybrid approach, for 110 109 (including the honorary state KT).
    print /^\w\w\z/&("MNCAKSCOHINMOKTNVALARIDCTX SDE GAZ ORIL"=~/$_/| /(FL|IA|KY|M[ADEIST]|N[DEHNJY]|PA|[UV]T|W[AIVY])/) for qw(AK AL AR AZ CA CO CT DC DE FL GA HI IA ID IL IN KS KT KY LA MA MD ME MI MN MO MS MT NC ND NE NH NJ NM NV NY OH OK OR PA RI SC SD TN TX UT VA VT WA WI WV WY)
    Getting here involved finding the longest unbroken string that could hold overlapping state names, adding a few more with breaks (spaces), then handling the strays with a regexp. A close inspection will show that Rhode Island is handled twice. That actually saved strokes.

    I started down the path of looking for the shortest string that contains all of the state names (with breaks). That script is still running... So far, the solution above is shorter.

      Or 105:
      print /^..\z/&("MNCAKSCOHINMOKTNVALARIDCTX SDE GAZ ORIL"=~/$_/| /FL|IA|KY|M[ADEIST]|N[DEHJY]|PA|[UV]T|W[AIVY]/) for qw(AK AL AR AZ CA CO CT DC DE FL GA HI IA ID IL IN KS KT KY LA MA MD ME MI MN MO MS MT NC ND NE NH NJ NM NV NY OH OK OR PA RI SC SD TN TX UT VA VT WA WI WV WY MY MNC .* ), " SD", "AZ\n"
      (Dropping the inner parens around the or-chain and disregarding the state of NN which is not being honored today.)

      OOPSDATE: Yep, matches if $_='.*' (see runriq immediately below and slightly to the right).  It needs /\Q$_/, as so often happens, which defeats the character savings.  I was looking at the \w as avoiding matching the spaces (which it also does).    (Oh, and thanks runriq and I'm sorry too!)
        p

        Sorry, but your expression makes '.*' an honorary state :-)

        That's why dws uses '\w\w' instead of '..' in his answer.

      Now 108
      /^\w\w\z/&("MNCAKSCOHINMOKTNVALARIDCTX SDE GAZ ORIL IAKY"=~/$_/| /(FL|M[ADEIST]|N[DEHNJY]|PA|[UV]T|W[AIVY])/)
      Another stoke off by adding a redundant check for Alaska, which allowed me to move Iowa and Kentucky out of the regexp and into the (more compact) string.

      This suggests that there's a lesson in here about how a bit of redundancy in data can lead to more compact representations. I think we often hear problems like "match every state and nothing else" as "match every state exactly once, and nothing else."

Re: Golf challange: match U.S. State names
by no_slogan (Chaplain) on Jun 06, 2001 at 21:53 UTC
    106 charcters, making use of the fact that no abbreviation starts with x, y, or z. gack.
    /^[A-W]\w\z/&&"OHILAKTXWVTNVARINYKSDE MIAZWIDCOKYFL MOR MNCAL WYMSCTNM +TNE MD UT ME"=~/$_/|/[WGPM]A|N[DHJ]/
      How about 105
      /^[A-W]\w\z/&&"OHILAKTXWVTNVARINYKSDE MIAZWIDCOKYFL MOR MNCAL WYMSCTNM +TNJ NH UT"=~/$_/|/[WGPM]A|[MN][DE]/

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlmeditation [id://86047]
Approved by root
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others meditating upon the Monastery: (12)
As of 2014-04-18 22:51 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    April first is:







    Results (472 votes), past polls