Beefy Boxes and Bandwidth Generously Provided by pair Networks
Clear questions and runnable code
get the best and fastest answer

Re^3: length() miscounting UTF8 characters?

by farang (Chaplain)
on Apr 28, 2014 at 20:51 UTC ( #1084195=note: print w/replies, xml ) Need Help??

in reply to Re^2: length() miscounting UTF8 characters?
in thread length() miscounting UTF8 characters?

Well, all perl builtins work at the codepoint level, including length. Depending on your definition of "character", that might or might not be what the OP wants.
Sure, I'm just saying that bugs or unexpected results can occur if care is not taken. As amon pointed out, the same visual representation of a character with a diacritical might have either one or two codepoints.
#!/usr/bin/env perl use v5.14; use warnings; use utf8; binmode STDOUT, 'utf8'; my $o_umlaut1 = "\x{F6}"; my $o_umlaut2 = "\x{6F}\x{308}"; my $string1 = "" . $o_umlaut1; my $string2 = "" . $o_umlaut2; say "length of $string1 is ", length($string1); say "length of $string2 is ", length($string2);
length of  is 3
length of ö is 4

I'll play around with your module. Thai is somewhat unique in that the first combining character may be another alphabetic character, so counting extended graphemes does not necessarily give the correct count of alphabetic characters.

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://1084195]
[Cosmic37]: now I have tried another blunder - can anyone explain why I am such a dunderhead?
[Cosmic37]: if ( $line =~ /$mydt/ ) { print $line; }
[Cosmic37]: I try to match successive date times stored in variable $mydt
[Cosmic37]: I guess it is searching for the string "$mydt"
[Corion]: Indeed cool, erix ;)
[Cosmic37]: rather than the value of $mydt which is a date time strong such as 2016-01-01 12:30:56
[Corion]: Cosmic37: No, but maybe $mydt doesn't contain what you think it does, or it contains characters that are special in a regular expression? Try if( $line =~ /\Q$mydt\E/) { ... for a literal match
[Cosmic37]: I mean string grrr
[Corion]: Maybe add an else branch in which you print what the values of $line and $mydt are?
[Cosmic37]: ah thank you I will try

How do I use this? | Other CB clients
Other Users?
Others studying the Monastery: (8)
As of 2017-06-29 17:00 GMT
Find Nodes?
    Voting Booth?
    How many monitors do you use while coding?

    Results (673 votes). Check out past polls.