Beefy Boxes and Bandwidth Generously Provided by pair Networks
XP is just a number
 
PerlMonks  

Re: Encoding issue

by w-ber (Hermit)
on Mar 09, 2007 at 18:47 UTC ( #604029=note: print w/ replies, xml ) Need Help??


in reply to Encoding issue

The problem with guessing character encoding is pretty impossible to solve accurately. It's easy to tell the difference between any of the /UTF-\d\d?/ encodings and ASCII, and it's still pretty easy to tell the difference between ASCII and an 8-bit character set or 8-bit character set and /UTF-\d\d?/. Problems begin when you have 8-bit encoded data and you try to guess which one of the dozens of 8-bit encodings it is.

For example, the ISO-8859 series contains no less than fifteen different encodings, plus all old IBM/Microsoft code pages (CP850, CP437; remember DOS?), encodings used by Windows (such as CP-1252), and so on, not to even mention Asian and East European encodings! In a word: good luck.

(I think Shift JIS is also easy to differentiate from ASCII and UTF, but I have no experience with it. It's 8-bit, but uses two bytes for non-ASCII characters.)

--
print "Just Another Perl Adept\n";


Comment on Re: Encoding issue

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://604029]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others taking refuge in the Monastery: (12)
As of 2015-07-06 13:29 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    The top three priorities of my open tasks are (in descending order of likelihood to be worked on) ...









    Results (74 votes), past polls