You are right, I didn't consider how indexing into a buffer works which contains multi-byte characters. There is an ugly solution for that, which would be a new type of scalar that stores two numbers, one for the byte index and one for the codepoint index. But let's not go there.
Now I'm even more at a loss on how to make p5's Unicode handling more robust. Maybe a three-way flag (byte/codepoint/unknown) could be introduced, and operations on incompatible types could then at least warn (probably with a warning not enabled by default), but not coerce. And it would provide at least some measure of introspection.