Beefy Boxes and Bandwidth Generously Provided by pair Networks
Pathologically Eclectic Rubbish Lister
 
PerlMonks  

Re^4: Best Way to Get Length of UTF-8 String in Bytes?

by ikegami (Patriarch)
on Apr 24, 2011 at 05:43 UTC ( [id://901015]=note: print w/replies, xml ) Need Help??


in reply to Re^3: Best Way to Get Length of UTF-8 String in Bytes?
in thread Best Way to Get Length of UTF-8 String in Bytes?

I see use bytes; without any utf8::upgrade or utf8::downgrade, and that usually indicates code that suffers from "The Unicode Bug".

sub bytelen(_) { require bytes; return bytes::length($_[0]); }

should be

sub utf8len(_) { utf8::upgrade($_[0]); require bytes; return bytes::length($_[0]); }

Or the same without bytes:

sub utf8len(_) { utf8::upgrade($_[0]); Encode::_utf8_off($_[0]); my $utf8len = length($_[0]); Encode::_utf8_on($_[0]); return $utf8len; }

Update: Added non-bytes alternative.

Replies are listed 'Best First'.
Re^5: Best Way to Get Length of UTF-8 String in Bytes?
by tchrist (Pilgrim) on Apr 24, 2011 at 06:01 UTC
    And just which part of
    That assumes that the strings are Unicode strings with their UTF‑8 flags on.
    didn’t you understand?
      FWIW, if it is easy to check, code might as well check instead of merely assuming :)
      That doesn't affect anything I said.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://901015]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others browsing the Monastery: (7)
As of 2024-04-19 10:55 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found