Beefy Boxes and Bandwidth Generously Provided by pair Networks
Problems? Is your data what you think it is?
 
PerlMonks  

Re: Removing Unsafe Characters

by StommePoes (Scribe)
on Apr 29, 2009 at 13:57 UTC ( #760884=note: print w/ replies, xml ) Need Help??


in reply to Removing Unsafe Characters

If this content is all really old and comes from everywhere, you likely have more to worry about than just utf-8 vs latin-1... I run into the Windows 1252 stuff sometimes and the problem there, as I understand it, is while it often has the same characters as ISO-8859-1, it also has many that are just some MS version of a character. How many people typed something in Word and then sent as an email or through onto a web site?


Comment on Re: Removing Unsafe Characters
Re^2: Removing Unsafe Characters
by ikegami (Pope) on Apr 29, 2009 at 14:17 UTC

    Once you decode UTF-8, iso-latin-1 and cp1252, you end up with Unicode characters, so that doesn't change the problem:

    • Determining which Unicode characters can be represented by most browser/computer setups, and
    • determining what to do with those that can't.

    Yes, you might get undecodable text if you receive something that's in the wrong encoding. And yes, a different character than the intended one might be displayed. But that's an entirely different problem than the one the OP asked about.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://760884]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others scrutinizing the Monastery: (7)
As of 2014-12-21 05:54 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    Is guessing a good strategy for surviving in the IT business?





    Results (103 votes), past polls