in reply to Re: Best Way to Get Length of UTF-8 String in Bytes?
in thread Best Way to Get Length of UTF-8 String in Bytes?
Thank you, ikegami.
Here's what I had tried before posting my inquiry:
#!perl
use strict;
use warnings;
use open qw( :utf8 :std );
use utf8;
# 'China' in Simplified Chinese
# 中 国
# Unicode U+4E2D U+56FD
# UTF-8 E4 B8 AD E5 9B BD
my $text = '中国';
my $length_in_characters = length $text;
print "Length of text '$text' in characters is $length_in_characters\n";
{
use bytes;
my $length_in_bytes = length $text;
print "Length of text '$text' in bytes is $length_in_bytes\n";
}
{
require Encode;
my $bytes = Encode::encode_utf8($text);
my $length_in_bytes = length $bytes;
print "Length of text '$bytes' in bytes is $length_in_bytes\n";
}
And here's its output:
Length of text '中国' in characters is 2
Length of text 'ä¸å½' in bytes is 6
Length of text 'ä¸å½' in bytes is 6
(I couldn't use <code> tags here due to the Chinese characters in both the script and its output.)
Jim
In Section
Seekers of Perl Wisdom