Beefy Boxes and Bandwidth Generously Provided by pair Networks
No such thing as a small change
 
PerlMonks  

Re: Why does Encode::Repair only correctly fix one of these two tandem characters?

by ikegami (Pope)
on Aug 09, 2014 at 05:32 UTC ( #1096827=note: print w/ replies, xml ) Need Help??


in reply to Why does Encode::Repair only correctly fix one of these two tandem characters?

$ldqm = encode 'UTF-8', decode 'Windows-1252', encode 'UTF-8', $ldqm; $ldqm => 201C encode 'UTF-8' => E2 80 9C decode 'Windows-1252' => 00E2 20AC 0153 encode 'UTF-8' => C3 A2 E2 82 AC C5 93
$rdqm = encode 'UTF-8', decode 'Windows-1252', encode 'UTF-8', $rdqm; $rdqm => 201D encode 'UTF-8' => E2 80 9D decode 'Windows-1252' => 00E2 20AC ???? [error handling] => 00E2 20AC FFFD encode 'UTF-8' => C3 A2 E2 82 AC EF BF BD

Windows-1252 doesn't have a character defined for 9D, so when you decode('Windows-1252', "\x9D"), you do something irreversible. The following all result in C3 A2 E2 82 AC EF BF BD.

  • U+2001 EM QUAD
  • U+200D ZERO WIDTH JOINER
  • U+200F RIGHT-TO-LEFT MARK
  • U+2010 HYPHEN
  • U+201D RIGHT DOUBLE QUOTATION MARK


Comment on Re: Why does Encode::Repair only correctly fix one of these two tandem characters?
Select or Download Code

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://1096827]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others perusing the Monastery: (10)
As of 2015-07-30 21:42 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    The top three priorities of my open tasks are (in descending order of likelihood to be worked on) ...









    Results (273 votes), past polls