Beefy Boxes and Bandwidth Generously Provided by pair Networks
Problems? Is your data what you think it is?

Re^3: UTF8 Validity

by graff (Chancellor)
on Feb 22, 2008 at 02:18 UTC ( #669446=note: print w/replies, xml ) Need Help??

in reply to Re^2: UTF8 Validity
in thread UTF8 Validity

Encode::Guess is likely to be helpful for figuring out the source encodings for many of the Asian (multi-byte-char) strings, though it might not help much for distinguishing among single-byte encodings. Worth a try.

Replies are listed 'Best First'.
Re^4: UTF8 Validity
by Anonymous Monk on Feb 22, 2008 at 11:07 UTC

    Encode::Guess is lame because the user needs to tell it which encoding the binary is.

    Use Encode::Detect instead. This is the same detector used in Mozilla browsers.

      I've been using Encode::Guess, but have had trouble building a suspects list for some data. However, Firefox hasn't been able to appropriately handle the problem data, either, so if Encode::Detect is the same method, I doubt it would've done any better on this data.

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://669446]
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others rifling through the Monastery: (7)
As of 2020-12-01 09:13 GMT
Find Nodes?
    Voting Booth?

    No recent polls found