in reply to Re: Sorting Vietnamese text
in thread Sorting Vietnamese text
And here's what I came up
sub make_sort_order { my $str = shift; $str =~ tr(aáàảãạăaáàảãạăắằẳẵặâấầẩẫậbcdđeéèẻẽẹêếềểễệfghiíìỉĩịjklmnoóòỏõọôốồổỗộơớờởỡợpqrstuúùủũụưứừửữựvwxyýỳỷỹỵz) (00000011111111111112222223456777777888888abcddddddefghijjjjjjkkkkkkllllllmnopqrrrrrrsssssstuvwwwwwwx)d; return $str; } my @words = ('ầm', 'ãm', 'ấm chè', 'ám số'); print $_->[1], "[n" for sort { $a->[0] cmp $b->[0] || $a->[1] cmp $b->[1] } map { [ make_sort_order($_), $_ ] } @words;
It's still missing a correct 'secondary sort' (for the edge case when the diacritic-stripped words are identical); it should not be difficult to add once someone figures out a suitable transliteration that sorts asciibetically.
In Section
Seekers of Perl Wisdom