Re: truncate string to byte count

I wrote this before more experienced monks said it's unworthy XY-problem (maybe limited length, in bytes, of buffer of some sort?), but let it be FWIW :)

Straightforward one would be:

use strict;
use warnings;
use feature 'say';

use utf8;
use Encode qw/ encode decode _utf8_off /;

my $input = 'Test Ршзефф 号召力打了';
my $byte_limit = 25;

my $limited = decode( 'utf8',
    substr( 
        encode( 'utf8', $input ), 
    0, $byte_limit ),
Encode::FB_QUIET | Encode::LEAVE_SRC );
    
binmode STDOUT, 'utf8';
say $limited;

25th byte is in the middle of 3d Chinese character, thus discarded. Obvious complications would be what if some characters can't be present at the end of line (word), what if diacritics (i.e. inseparable parts of graphemes) are thrown out, or invisible things such as joiners are left dangling, etc. Third (unused) import can be used to modify input in-place instead of "encode", e.g. for performance. The LEAVE_SRC is also for phantom of performance, isn't necessary. FB_QUIET returns valid decoded part.

Comment on Re: truncate string to byte count

Replies are listed 'Best First'.

Re^2: truncate string to byte count
by stevieb (Canon) on Feb 27, 2019 at 23:59 UTC

Hey, wait! Nobody said it was an "unworthy" post, it was mentioned that it looks like an X/Y problem, which it kind of does.

That's why requests for details were thrown out there.

You're able to answer any way you want. You do not need to precede your answer in such a way... the more experienced Monks love answers that appear to go around the 'norm' :D

(Hell, I will even answer a homework question periodically when I'm bored/angry/frustrated whatever just to get my mind off of things, and sometimes more experienced Monks wouldn't do that even. Each to their own!)

[reply]


P is for Practical
	PerlMonks