in reply to Sorting Vietnamese text
Note: take what I say here with a grain of salt since I know no Vietnamese.
Here's the Vietnamese alphabet sort order. And here's how to read that chart:
- First column (darkest colour) has the letter in question
- The other columns have the glyphs that sort under that letter
- Therefore, ấ and Ầ and ậ sort under â (will be found in the dictionary under the heading 'â')
- In the case where the two words are otherwise 100% equivalent (except for the diacritics), sort in the left-to-right order given in the chart.
Here's how I handled Japanese sorting (hiragana only) based on a similar chart for Japanese:
sub transliterate { my $str = shift; $str =~ tr(がぎぐげござじずぜぞだぢづでどばびぶべぼぱぴぷぺぽっゃゅょ) (かきくけこさしすせそたちつてとはひふへほはひふへほつやゆよ); return $str; } sub gozyuuon { $a->{'sort'} cmp $b->{'sort'} || $a->{'reading'} cmp $b->{'reading'}; } my @rows = ( { word => '同時', reading => 'どうじ' }, { word => '当日', reading => 'とうじつ' }, { word => '同士', reading => 'どうし' }, { word => '投資', reading => 'とうし' }, { word => '当時', reading => 'とうじ' }, { word => '同室', reading => 'どうしつ' }, ); # create a version with the dakuten (") stripped for (@rows) { $_->{'sort'} = transliterate($_->{reading}); } for my $row (sort gozyuuon @rows) { printf "%s・%s\n", $row->{reading}, $row->{word}; }
Japanese is a bit easier since the unicode codepoints are in correct order already; I only needed to handle the equivalent-sort-order characters.
In Section
Seekers of Perl Wisdom