Beefy Boxes and Bandwidth Generously Provided by pair Networks
We don't bite newbies here... much
 
PerlMonks  

Re^5: Why does Encode::Repair only correctly fix one of these two tandem characters?

by ikegami (Pope)
on Aug 11, 2014 at 01:49 UTC ( #1096954=note: print w/ replies, xml ) Need Help??


in reply to Re^4: Why does Encode::Repair only correctly fix one of these two tandem characters?
in thread Why does Encode::Repair only correctly fix one of these two tandem characters?

The most common garbage from Perl code is mixed UTF-8 and latin-1. It happens when you forgot to specify the output encoding.

print "\N{LATIN CAPITAL LETTER E WITH ACUTE}"; print "\N{BLACK SPADE SUIT}";

The first string consists entirely of bytes, so Perl doesn't know you did something wrong. The second string makes no sense, so Perl guesses you meant to encode it using UTF-8. You end up with a mix of code points (effectively latin-1) and UTF-8.

This is fixed using Encoding::FixLatin


Comment on Re^5: Why does Encode::Repair only correctly fix one of these two tandem characters?
Download Code

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://1096954]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others avoiding work at the Monastery: (10)
As of 2014-11-27 11:59 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    My preferred Perl binaries come from:














    Results (184 votes), past polls