Re^3: What does utf8::upgrade actually do.

I would very strongly suggest that the user should expect Rmpz_import() to process the series of base-256 "digits" obtained by (map ord($_), split //, $str), regardless of the internal encoding of the string. So (a) is the correct result. (b) is just horrible, and is repeating the broken Unicode model that appeared in perl 5.6 and was (mostly) fixed by perl 5.8.

Your only real decision needs to be what to do for a codepoint > 0xff. Three obvious choices are: croak; treat each codepoint modulo 256, or carry the overflow into the next digit. So the string "\x40\x{150}\x60" would yield the integer value 0x615040. (I haven't looked at what endedness the function works to, but that should give you the general idea of what I mean.)

Dave.

Comment on Re^3: What does utf8::upgrade actually do.

Replies are listed 'Best First'.

Re^4: What does utf8::upgrade actually do.
by syphilis (Archbishop) on Feb 18, 2021 at 14:25 UTC


Just another Perl shrine
	PerlMonks