This is part 1 of 2 (maybe 3) of a series, but each piece builds upon the last so the next one won't be posted for some time.

Given: a word, and an 'alphabet' string (and to be exact about this latter part, each character in the word and the alphabet can be represented in 7 bits, eg the printable ASCII set).

Find the perl golf solution that returns the base pattern for the word, that is, the leftmost character in the word is represented by the first character in the alphabet at all positions; the next leftmost distinct character by the second character from the alphabet, and so forth. For example, given "google", and the alphabet a-z, the subroutine should give back "abbacd".

While the characters in the word may not be in the alphabet, the number of distinct characters in the word will always be less than or equal to the size of the alphabet. (for example, I could use the alphabet "123456789" for encoding words with no more than 9 distinct characters).

An example of a non-golfed solution follows:

#!/usr/bin/perl -w my @strings = qw( google mississippi that people ); my $alphabet = join "", ( 'a'..'z' ); foreach ( @strings ) { print c( $_, $alphabet ), "\n"; } sub c { my ( $string, $alpha ) = @_; my @letters = split //, $string; my @alphabet = split //, $alpha; my %hash; my @result; my $i = 0; foreach ( @letters ) { if (!defined( $hash{ $_ } )) { $hash{ $_ } = $alphabet[ $i++ ]; } push @result, $hash{ $_ }; } return join "", @result; } # Prints out: #abbacd #abccbccbddb #abca #abcadb

Dr. Michael K. Neylon - || "You've left the lens cap of your mind on again, Pinky" - The Brain

Replies are listed 'Best First'.
Re: (Golf) Cryptographer's Tool #1
by btrott (Parson) on Jun 20, 2001 at 00:50 UTC
    Here's mine. Not particularly interesting, really, and probably missing out on a few standard golf tricks.
    sub c {my(@a,%h,$i)=split//,pop;my$s=pop;$s=~s!(.)!$h{$1}||=$a[$i++]!g +e;$s}
    67 chars.

    Update: got rid of one character by using substr:

    sub c {my($a,%h,$i)=pop;my$s=pop;$s=~s!(.)!$h{$1}||=substr$a,$i++,1!ge +;$s}
    66 chars.

    Update 2: if you don't need strict:

    sub c {($s,$a)=@_;my(%h,$i);$s=~s!(.)!$h{$1}||=substr$a,$i++,1!ge;$s}

    Update 3: Oh, and the reason I'm still initializing %h and $i is so that the sub will work on repeated invocations. If we don't *have* to do that, this will do:

    sub c {($s,$a)=@_;$s=~s!(.)!$h{$1}||=substr$a,$i++,1!ge;$s}

    Update 4: Use $_:

    sub c {($_,$a)=@_;s!(.)!$h{$1}||=substr$a,$i++,1!ge;$_}

    Update 5: Okay, so none of these actually *work* per the spec. Thanks, tilly. :)


      All 4 solutions fail if the alphabet includes a 0. (That is zero.)

        I move that it be resolved that for the purposes of this golf the alphabet may not include a zero.

        Is there a second?


Re: (Golf) Cryptographer's Tool #1
by chipmunk (Parson) on Jun 20, 2001 at 01:20 UTC
    My original best was 63 characters:
    sub canonical { my(@a,$i,%c,$c)=pop=~/./g;$c.=$c{$_}||=$a[$i++]for pop=~/./g;$c }
    (which is strict compliant, but only because at that point it wasn't any shorter being non-strict compliant. :)

    However, here's a 46 character solution building off of btrott's solutions:

    sub canonical { ($_,$a,%h)=@_;s!.!$h{$&}||=($a=~/./g,$&)!ge;$_ }
    Can be called multiple times, of course, as in the example.


    Update: Well, tilly's been mumbling something about following the spec :) , so here's a solution that accepts any ASCII character (including zero and linefeed) in the word or the alphabet:
    sub canonical { ($_,$a,%h)=@_;s!.!{$h{$&}=~s+^\z+$a=~/./gs,$&+ge}$h{$&}!gse;$_ }
    At 62, it actually beats my original solution by 1 character!
      Following the spec at 61:
      sub c { ($_,$a,%h)=@_;join'',map{substr$a,($h{$_}||=keys%h)-1,1}/./gs }
      ... and also in spiritus strictus at 62:
      sub c { my%h;join'',map{substr$_[1],($h{$_}||=keys%h)-1,1}$_[0]=~/./gs }
                     s aamecha.s a..a\u$&owag.print
        Oo, nice! And with inspiration once again from btrott's substitution approach:
        sub canonical { ($_,$a,%h)=@_;s/./substr"a$a",$h{$&}||=keys%h,1/gse;$_ }
        Back down to 54!
      That doesn't work when 0 is in the 'alphabet.' Here's a modified solution that handles 0 properly, which is only 8 more char than yours: It adds the null character to each string, which shouldn't print on most machines (it does on my dos, not on my unix).
      sub c { ($_,$a)=@_;s!.!$h{$&}||=($a=~/./g,"$&\0")!ge;y/\0//;$_ }

      Update:Fixed code as per chipmunk's comment. It's 54 chars now.

      The 15 year old, freshman programmer,
      Stephen Rawls

        Adding extra characters to the result is not a valid solution. They may be invisible when you print them out, but they're still there.
        nice, srawls... here it is down to 51:
        sub c{ ($_,$a)=@_;s!.!chr($h{$&}||=($a=~/./g,ord$&))!ge;$_ #23456789_123456789_123456789_123456789_123456789_1 }

        It doesn't work with null either, but as i read it, thats fine.

        update: oops, here's one that resets %h and is reusable, as per the given example, at 54:
        sub c{ ($_,$a,%h)=@_;s!.!chr($h{$&}||=($a=~/./g,ord$&))!ge;$_ #23456789_123456789_123456789_123456789_123456789_1234 }
        The fix fails if a null character is present in the alphabet.
                       s aamecha.s a..a\u$&owag.print
Re: (Golf) Cryptographer's Tool #1
by srawls (Friar) on Jun 20, 2001 at 00:53 UTC
    Here's my shot at it; 68 chars:
    sub c { ($_,$a)=@_; map{$h{$_}?$h{$_}:do{$h{$_}=(split//,$a)[$i++]."\0"}}/./g }


    Here's another at 66. This one takes a completely different approach--it removes all duplicates of letters(the long part), then does a simple translation on them (the easy part). Here it is:

    sub c { ($_,$a)=@_;$b=$_; 1 while$b=~s/(.)(.*?)\1/$1$2/g; eval"y/$b/$a/"; $_ }

    Update2:Good catch tilly. This makes it too long to contend, but here's the revised version:

    sub c { ($_,$a)=@_;$b=$_; 1 while$b=~s/(.)(.*?)\1/$1$2/g; eval"y/\Q$b\E/\Q$a\E/"; $_ }

    The 15 year old, freshman programmer,
    Stephen Rawls

      What if the searchlist contains /? Or a -?

      you can, of course, take out the \E's.

      still not a particular contender, though.

Re: (Golf) Cryptographer's Tool #1
by tadman (Prior) on Jun 23, 2001 at 10:03 UTC
    I was pretty sure I was getting somewhere with this approach, but I ended up at 54 characters, just like chipmunk.
    sub crypto{ ($_,$a,%h,$h)=@_;s/./chr($h{$&}||=vec$a,$h++,8)/gse;$_ }