Beefy Boxes and Bandwidth Generously Provided by pair Networks
P is for Practical
 
PerlMonks  

Re: Why does Encode::Repair only correctly fix one of these two tandem characters?

by ikegami (Patriarch)
on Aug 09, 2014 at 05:32 UTC ( [id://1096827]=note: print w/replies, xml ) Need Help??


in reply to Why does Encode::Repair only correctly fix one of these two tandem characters?

$ldqm = encode 'UTF-8', decode 'Windows-1252', encode 'UTF-8', $ldqm; $ldqm => 201C encode 'UTF-8' => E2 80 9C decode 'Windows-1252' => 00E2 20AC 0153 encode 'UTF-8' => C3 A2 E2 82 AC C5 93
$rdqm = encode 'UTF-8', decode 'Windows-1252', encode 'UTF-8', $rdqm; $rdqm => 201D encode 'UTF-8' => E2 80 9D decode 'Windows-1252' => 00E2 20AC ???? [error handling] => 00E2 20AC FFFD encode 'UTF-8' => C3 A2 E2 82 AC EF BF BD

Windows-1252 doesn't have a character defined for 9D, so when you decode('Windows-1252', "\x9D"), you do something irreversible. The following all result in C3 A2 E2 82 AC EF BF BD.

  • U+2001 EM QUAD
  • U+200D ZERO WIDTH JOINER
  • U+200F RIGHT-TO-LEFT MARK
  • U+2010 HYPHEN
  • U+201D RIGHT DOUBLE QUOTATION MARK

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://1096827]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others making s'mores by the fire in the courtyard of the Monastery: (2)
As of 2025-12-09 02:32 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?
    What's your view on AI coding assistants?





    Results (88 votes). Check out past polls.

    Notices?
    hippoepoptai's answer Re: how do I set a cookie and redirect was blessed by hippo!
    erzuuliAnonymous Monks are no longer allowed to use Super Search, due to an excessive use of this resource by robots.