Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl Monk, Perl Meditation
 
PerlMonks  

Re^5: How to reverse a (Unicode) string

by JavaFan (Canon)
on Jan 10, 2011 at 09:17 UTC ( #881423=note: print w/ replies, xml ) Need Help??


in reply to Re^4: How to reverse a (Unicode) string
in thread How to reverse a (Unicode) string

You seem to be confusing "1-to-1" mapping, and "identity function". While the identity function is a trivial "1-to-1" mapping, it's not true every "1-to-1" mapping is the identity function.

However, even side-stepping that, Juerd doesn't mean byte values map 1-to-1. The mapping is after decoding. For instance, the UTF-8 byte sequence 0x82 0xC3 decodes to C2. Which indeed does map to the C2 Unicode code point.


Comment on Re^5: How to reverse a (Unicode) string
Re^6: How to reverse a (Unicode) string
by ikegami (Pope) on Jan 10, 2011 at 15:50 UTC

    In that case, we're back to the original question. Are there any encodings aren't "Unicode encodings"?

    (Strictly speaking, the mapping isn't 1-to-1. U+2660 can't be encoded in iso-8859-1. You could also say that both U+00E9 and U+0065 U+0301 encode to E9 in iso-8859-1, although Encode's encode doesn't handle that.)

      Strictly speaking, the mapping isn't 1-to-1. U+2660 can't be encoded in iso-8859-1
      The claim is that iso-8859-1 maps 1-to-1 to Unicode, not that Unicode maps 1-to-1 to iso-8859-1. A 1-to-1 mapping is also known as an injection. The claim wasn't that it's a bijection (aka 1-to-1 correspondence).
Re^6: How to reverse a (Unicode) string
by ikegami (Pope) on Jan 10, 2011 at 16:15 UTC
    No, actually, I'm not confused. When the term was introduced, it was given as the reason iso-8859-1 works without being decoded, so he indeed meant an identity mapping.
      You have to always decode. Note that Unicode is a list of integers with a meaning. iso-8859-1 is an encoding (of a subset of Unicode). UTF-8 is also an encoding. UTF-16 is another. It just happens that for the first 128 code points, the encoding in iso-8859-1 and UTF-8 are identical. But that wasn't part of Juerds claim.

        You have to always decode.

        No, you don't have to with US-ASCII and iso-8859-1.

        But that wasn't part of Juerds claim.

        I agree. He didn't mention any relation between the first 128 characters of iso-8859-1 and UTF-8. No idea why you bring this up.

        iso-8859-1 is an encoding (of a subset of Unicode)

        Unicode is a character set, not an encoding, so that sentence is broken.

        iso-8859-1 is both a character set and an encoding. The iso-8859-1 character set is a subset of the Unicode character character set, but this property does NOT explain why iso-8859-1 works without being decoded.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://881423]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others chilling in the Monastery: (6)
As of 2015-07-04 20:23 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    The top three priorities of my open tasks are (in descending order of likelihood to be worked on) ...









    Results (60 votes), past polls