Re^6: Windows-1252 characters from \x{0080} thru \x{009f} (source-code encoding)


Your skill will accomplish what the force of many cannot
	PerlMonks

Re^6: Windows-1252 characters from \x{0080} thru \x{009f} (source-code encoding)

by moritz (Cardinal)

on Apr 19, 2012 at 19:35 UTC ( [id://966036]=note: print w/replies, xml )

Need Help??

in reply to Re^5: Windows-1252 characters from \x{0080} thru \x{009f} (source-code encoding)
in thread Windows-1252 characters from \x{0080} thru \x{009f}

You are right, I didn't consider how indexing into a buffer works which contains multi-byte characters. There is an ugly solution for that, which would be a new type of scalar that stores two numbers, one for the byte index and one for the codepoint index. But let's not go there.

Now I'm even more at a loss on how to make p5's Unicode handling more robust. Maybe a three-way flag (byte/codepoint/unknown) could be introduced, and operations on incompatible types could then at least warn (probably with a warning not enabled by default), but not coerce. And it would provide at least some measure of introspection.

Perl 6 - the future is here, just unevenly distributed