You seem to be confusing "1-to-1" mapping, and "identity function". While the identity function is a trivial "1-to-1" mapping, it's not true every "1-to-1" mapping is the identity function.
However, even side-stepping that, Juerd doesn't mean byte values map 1-to-1. The mapping is after decoding. For instance, the UTF-8 byte sequence 0x82 0xC3 decodes to C2. Which indeed does map to the C2 Unicode code point. | [reply] |
In that case, we're back to the original question. Are there any encodings aren't "Unicode encodings"?
(Strictly speaking, the mapping isn't 1-to-1. U+2660 can't be encoded in iso-8859-1. You could also say that both U+00E9 and U+0065 U+0301 encode to E9 in iso-8859-1, although Encode's encode doesn't handle that.)
| [reply] [d/l] |
Strictly speaking, the mapping isn't 1-to-1. U+2660 can't be encoded in iso-8859-1
The claim is that iso-8859-1 maps 1-to-1 to Unicode, not that Unicode maps 1-to-1 to iso-8859-1. A 1-to-1 mapping is also known as an injection. The claim wasn't that it's a bijection (aka 1-to-1 correspondence).
| [reply] |
No, actually, I'm not confused. When the term was introduced, it was given as the reason iso-8859-1 works without being decoded, so he indeed meant an identity mapping.
| [reply] |
You have to always decode. Note that Unicode is a list of integers with a meaning. iso-8859-1 is an encoding (of a subset of Unicode). UTF-8 is also an encoding. UTF-16 is another. It just happens that for the first 128 code points, the encoding in iso-8859-1 and UTF-8 are identical. But that wasn't part of Juerds claim.
| [reply] |