http://www.perlmonks.org?node_id=79137


in reply to Golf: Arbitrary Alphabetical Sorting

Well, here's a cut at it, even though it's not golfed yet:
sub o { my ($a, $w) = @_; my %m; @m{@$a} = (a..z); my $k = join "", keys %m; my $v = join "", values %m; map { eval "tr/$v/$k/"; $_ } sort map { eval "tr/$k/$v/"; $_ } @$w; }

-- Randal L. Schwartz, Perl hacker


update: and of course, it's doggy slow. That eval needs to be saved, like in a coderef or something.
update2: Bleh. RIght. 'a'..'z' was being presumptive that "letter" meant "letter" like I knew it. Here's the patch to make it work for larger sets, for reasonable values of work:
sub o { my ($a, $w) = @_; my %m; @m{@$a} = 1..@$a; my $k = join "", map {sprintf "\\%03o", ord $_} keys %m; my $v = join "", map {sprintf "\\%03o", $_ } values %m; map { eval "tr/$v/$k/"; $_ } sort map { eval "tr/$k/$v/"; $_ } @$w; }

Replies are listed 'Best First'.
Re: Re: Golf: Arbitrary Alphabetical Sorting
by Masem (Monsignor) on May 09, 2001 at 20:05 UTC
    While your $v/$k order is right, the one gotcha that I put in there was that the alphabet size was arbitrary, and not necessarily 26 characters.. it could be as many as 100, 1000, or more (Well, anything in non-Unicode above 255 makes no sense). So this trick doesn't work here.
    Dr. Michael K. Neylon - mneylon-pm@masemware.com || "You've left the lens cap of your mind on again, Pinky" - The Brain
      Well, if it's a single "character", tr likes it. What's the problem? About the only thing that's messy is the use of a slash and dash, and I could probably hork that somehow by always mapping to an octal-ish escape.

      I've reread the question three times now, and I don't see how you are defining "character" in any way other than something that tr can wrangle. If so, what's the structure of a "word" then? It's no longer a string, which would be a sequence of "characters" that tr can handle!

      -- Randal L. Schwartz, Perl hacker

        But what if the alphabet has 100 characters? Then, %m will have a through z, undef, undef, undef... as values, unless I'm missing something, which I'm pretty sure I'm not. That will lead to incorrect handling of the characters "above" 26.
        In your code, you prep your translation hash %m with keys from the input alphabet, and values of a through z. What if the input alphabet was [a..z1], or [a..z0-9]? These are more than 26 characters and thus, the 27th and higher characters will have null values for their keys, and will mess up your tr-and-back transformation.
        Dr. Michael K. Neylon - mneylon-pm@masemware.com || "You've left the lens cap of your mind on again, Pinky" - The Brain