http://www.perlmonks.org?node_id=1068139


in reply to Sorting Vietnamese text

Note: take what I say here with a grain of salt since I know no Vietnamese.

Here's the Vietnamese alphabet sort order. And here's how to read that chart:


Here's how I handled Japanese sorting (hiragana only) based on a similar chart for Japanese:

sub transliterate {
	my $str = shift;
	$str =~
		tr(がぎぐげござじずぜぞだぢづでどばびぶべぼぱぴぷぺぽっゃゅょ)
		  (かきくけこさしすせそたちつてとはひふへほはひふへほつやゆよ);
	return $str;
}

sub gozyuuon {
	$a->{'sort'} cmp $b->{'sort'} ||
	$a->{'reading'} cmp $b->{'reading'};
}

my @rows = (
	{ word => '同時', reading => 'どうじ' },
	{ word => '当日', reading => 'とうじつ' },
	{ word => '同士', reading => 'どうし' },
	{ word => '投資', reading => 'とうし' },
	{ word => '当時', reading => 'とうじ' },
	{ word => '同室', reading => 'どうしつ' },
);

# create a version with the dakuten (") stripped
for (@rows) {
	$_->{'sort'} = transliterate($_->{reading});
}
for my $row (sort gozyuuon @rows) {

	printf "%s・%s\n", $row->{reading}, $row->{word};
}

Japanese is a bit easier since the unicode codepoints are in correct order already; I only needed to handle the equivalent-sort-order characters.