Is there a different function than length() that would return the number of output characters (i.e. 1 in the case of "\x{0075}\x{0308}")?
No, there isn't a built-in function. You must roll your own.
So far, the only solution I came up with is to convert each string into Unicode normalization form C before calculating its output length, but that seems more complicated than I feel this should be.
Normalizing to NFC isn't helpful in the general case. It doesn't ensure every character meaures one code point in length, so it can't be used generally to measure grapheme cluster length. Consider, for example, a lowercase M with both an umlaut and a cedilla…
#!perl
use strict;
use warnings;
use open qw( :encoding(UTF-8) :std );
use charnames qw( :full );
use Unicode::Normalize;
sub length_in_grapheme_clusters {
my $length;
$length++ while $_[0] =~ m/\X/g;
return $length;
};
my $invented_character
= "\N{LATIN SMALL LETTER M}"
. "\N{COMBINING DIAERESIS}"
. "\N{COMBINING CEDILLA}";
my $invented_character_NFC
= NFC($invented_character);
my $length_of_invented_character_in_code_points
= length $invented_character;
my $length_of_invented_character_NFC_in_code_points
= length $invented_character_NFC;
my $length_of_invented_character_in_grapheme_clusters
= length_in_grapheme_clusters($invented_character);
my $length_of_invented_character_NFC_in_grapheme_clusters
= length_in_grapheme_clusters($invented_character_NFC);
print "$invented_character\n";
print "$length_of_invented_character_in_code_points\n";
print "$length_of_invented_character_NFC_in_code_points\n";
print "$length_of_invented_character_in_grapheme_clusters\n";
print "$length_of_invented_character_NFC_in_grapheme_clusters\n";
exit 0;
This prints…
m̧̈
3
3
1
1