Beefy Boxes and Bandwidth Generously Provided by pair Networks
Keep It Simple, Stupid
 
PerlMonks  

Re: Removing Unsafe Characters

by StommePoes (Scribe)
on Apr 29, 2009 at 13:57 UTC ( #760884=note: print w/ replies, xml ) Need Help??


in reply to Removing Unsafe Characters

If this content is all really old and comes from everywhere, you likely have more to worry about than just utf-8 vs latin-1... I run into the Windows 1252 stuff sometimes and the problem there, as I understand it, is while it often has the same characters as ISO-8859-1, it also has many that are just some MS version of a character. How many people typed something in Word and then sent as an email or through onto a web site?


Comment on Re: Removing Unsafe Characters
Re^2: Removing Unsafe Characters
by ikegami (Pope) on Apr 29, 2009 at 14:17 UTC

    Once you decode UTF-8, iso-latin-1 and cp1252, you end up with Unicode characters, so that doesn't change the problem:

    • Determining which Unicode characters can be represented by most browser/computer setups, and
    • determining what to do with those that can't.

    Yes, you might get undecodable text if you receive something that's in the wrong encoding. And yes, a different character than the intended one might be displayed. But that's an entirely different problem than the one the OP asked about.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://760884]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others chanting in the Monastery: (8)
As of 2015-07-05 15:00 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    The top three priorities of my open tasks are (in descending order of likelihood to be worked on) ...









    Results (67 votes), past polls