Beefy Boxes and Bandwidth Generously Provided by pair Networks
Do you know where your variables are?
 
PerlMonks  

Re: Why does Encode::Repair only correctly fix one of these two tandem characters?

by ikegami (Pope)
on Aug 09, 2014 at 05:32 UTC ( #1096827=note: print w/ replies, xml ) Need Help??


in reply to Why does Encode::Repair only correctly fix one of these two tandem characters?

$ldqm = encode 'UTF-8', decode 'Windows-1252', encode 'UTF-8', $ldqm; $ldqm => 201C encode 'UTF-8' => E2 80 9C decode 'Windows-1252' => 00E2 20AC 0153 encode 'UTF-8' => C3 A2 E2 82 AC C5 93
$rdqm = encode 'UTF-8', decode 'Windows-1252', encode 'UTF-8', $rdqm; $rdqm => 201D encode 'UTF-8' => E2 80 9D decode 'Windows-1252' => 00E2 20AC ???? [error handling] => 00E2 20AC FFFD encode 'UTF-8' => C3 A2 E2 82 AC EF BF BD

Windows-1252 doesn't have a character defined for 9D, so when you decode('Windows-1252', "\x9D"), you do something irreversible. The following all result in C3 A2 E2 82 AC EF BF BD.

  • U+2001 EM QUAD
  • U+200D ZERO WIDTH JOINER
  • U+200F RIGHT-TO-LEFT MARK
  • U+2010 HYPHEN
  • U+201D RIGHT DOUBLE QUOTATION MARK


Comment on Re: Why does Encode::Repair only correctly fix one of these two tandem characters?
Select or Download Code

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://1096827]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others romping around the Monastery: (4)
As of 2014-09-18 22:56 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    How do you remember the number of days in each month?











    Results (126 votes), past polls