http://www.perlmonks.org?node_id=812699


in reply to limiting length of utf8 string in bytes

On a utf8 string, chop appears to do 'the right thing', i.e. remove one trailing utf8 character, regardless of how many bytes it is.

I guess you could keep chop-ping your string while length>$threshold, but that's O(excess characters), which might get painful.

The other alternative is to proceed by inspection - under 'use bytes', examine the characters at the $threshold+1 position, and, working your way backwards, "substr" before that character if it's a valid utf8 start character.

That would require at most 4 loop iterations for valid utf8, I think.


Mike