in reply to The “real length" of UTF8 strings
length returns the number of characters. To get the length in bytes, you have to convert the string into a given encoding:
my $s="\x{5fcd}\x{65e0}\x{53ef}\x{5fcd}"; use Encode; print length encode("utf8", $s), "\n"; # 12
Since Unicode strings are stored in utf8 internally, you can use a number of hacks to avoid the explicit re-encoding:
print do {use bytes; length($s)}, "\n"; # 12 (see perldoc -f length) # or utf8::encode($s); # resets the utf8 flag print length($s), "\n"; # 12
|
---|
Replies are listed 'Best First'. | |
---|---|
Re^2: The “real length" of UTF8 strings
by moritz (Cardinal) on Sep 23, 2008 at 20:35 UTC | |
by betterworld (Curate) on Sep 23, 2008 at 21:54 UTC | |
by Anonymous Monk on Sep 24, 2008 at 04:21 UTC | |
by moritz (Cardinal) on Sep 24, 2008 at 07:59 UTC | |
by Anonymous Monk on Sep 24, 2008 at 04:18 UTC |
In Section
Seekers of Perl Wisdom