http://www.perlmonks.org?node_id=885332


in reply to Re^8: How to reverse a (Unicode) string
in thread How to reverse a (Unicode) string

The problem is the definition of character in the context of Unicode text.

No, I fully agree with you with the definition of character in the context of Unicode text.

At issue is that reverse cannot recognise the presence of Unicode text. How do you think reverse can tell the difference between chr(113).chr(101).chr(769) and "qe\N{COMBINING ACUTE ACCENT}"?

It can either always treat the string as Unicode text, or never. Currently, it never does. To change that is backwards incompatible, so you'd have to demonstrate a bug in order to change that behaviour.

use strict; use warnings; use charnames qw( :full ); sub current_reverse { return reverse(@_); } sub string_reverse { @_ = return reverse(@_) if wantarray; my @chars = join('', @_) =~ /./sg; return join '', @chars[ reverse 0..$#chars ]; } sub unicode_reverse { return reverse(@_) if wantarray; my @chars = join('', @_) =~ /\X/g; return join '', @chars[ reverse 0..$#chars ]; } printf("%-7s %-7s %-7s\n", "", "samples", "text"); printf("%-7s %-7s %-7s\n", "", "-------", "-------"); for (qw( current string unicode )) { my $reverser = do { no strict 'refs'; \&{ $_."_reverse" } }; my $water_samples = join '', map chr, 113, 101, 769; $water_samples = $reverser->($water_samples); my $last_sample = substr($water_samples, 0, 1); my $text = "Cafe\N{COMBINING ACUTE ACCENT}"; $text = $reverser->($text); my ($last_char) = $text =~ /^(\X)/; printf("%-7s %-7s %-7s\n", $_, ord($last_sample) == 769 ? 'ok' : 'not ok', $last_char eq "e\N{COMBINING ACUTE ACCENT}" ? 'ok' : 'not ok', ); }
samples text ------- ------- current ok not ok string ok not ok unicode not ok ok

Your whole argument for the presence of a bug is that reverse uses "character" could be confused with Unicode's definition of the word.

One or the other is wrong: the behavior of the reverse function or the reverse function's documentation.

Those are the only two options if and if reverse's documentation uses the same definition of "character" as the Unicode standard.

Update: Added code.