This is a rather small but useful module I hacked together a little while ago. It creates a transformation and sorting routine for any alphabet you give it. If you've got an alphabet in which vowels are sorted before consonants, you can use this module to create a sorting function that takes that into account.
You have to deal with lowercase and uppercase yourself, since (as in Klingon) they needn't sort to the same location. Supports a maximum of 256-character alphabets.
I just updated it, changing the function syntax a bit, and adding a feature.
package Language::MySort;
require Exporter;
@ISA = qw( Exporter );
@EXPORT = qw( lang sort );
%words = ();
sub lang_sort {
my ($ignore, $same, $chars, $tr, $sorter) = ("", "");
if (ref $_[-1]) {
my $opt = pop;
$ignore = $opt->{ignore} || "";
$same = $opt->{translate} || "";
$ignore = "\$s =~ tr/\Q$ignore\E//d;";
if ($same) {
my @f = map substr($_, 0, 1, ""), @$same;
$same =
" =~ tr/" .
quotemeta(join "", @$same) .
"/" .
quotemeta(join "", map $f[$_] x length($same->[$_]), 0 .. $#$s
+ame) .
"/";
}
}
$chars = @_ == 1 ? shift : join "", @_;
$tr = eval qq{
sub {
(my \$s = shift) $same;
$ignore
\$s =~ tr/\Q$chars\E/\000-\377/;
\$s;
}
};
$sorter = sub {
my @used = map $tr->($_), @_;
@{ $words{$chars} }{ @used } = @_;
@{ $words{$chars} }{ sort @used };
};
return wantarray() ? ($sorter, $tr) : $sorter;
}
1;
Here's a sample run to create a sorter for (lowercase) French text (I don't think I left out any accented characters, but I could be wrong).
use Language::MySort;
*french_sort = lang_sort(
# *the character list*
# only includes the characters remaining after
# the identical-character map has been applied
'a' .. 'z',
{
# *the identical-character map*
# maps characters to the character
# they should sort identically as
# "AXYZ" means that X, Y, and Z are translated as A
identical =>
["a\340", "c\347", "e\350\351\352\353", "o\364"],
}
);
{
local $, = " ";
print french_sort(
"\351tude",
"\352tre",
"tr\350s",
"entrer",
"\351t\351",
);
}
And here's a sample run for a small language of 10 characters in which vowels "a", "e", and "i" sort before every other letter, and ignores the language's mid-word punctuation, "-" and ".":
use Language::MySort;
*weird_sort = lang_sort(
# place vowels ahead of consonants
qw( a e i b c d f g h j ),
{
# map uppercase characters to lowercase
identical => [qw( aA bB cC dD eE fF gG hH iI jJ )],
# ignore - and .
ignore => "-.",
}
);
Because of the way the generator function works (using the
tr/// operator), you can also write the above function call as:
use Language::MySort;
*weird_sort = lang_sort(
# place vowels ahead of consonants
qw( a e i ), 'a' .. 'j',
{
# map uppercase characters to lowercase
identical => [qw( aA bB cC dD eE fF gG hH iI jJ )],
# ignore - and .
ignore => "-.",
}
);
Even though the vowels are duplicated in the character list, the transliteration operator will only recognize the first occurrence of them. It's a bit of Perl magic that the module takes advantage of to make your life a bit easier.
Finally, here's a simpler sorter for English alphabetical order that puts capital letters before their lowercase counterparts, but intersperses uppercase and lowercase words (so you get Axxx axxx Bxxx bxxx, not Axxx Bxxx axxx bxxx).
use Language::MySort;
*sorter = lang_sort(
# nifty way to make (A, a, B, b, C, c, ... Z, z)
(map +($_, lc), 'A' .. 'Z')
{ ignore => q{-} }
);