Beefy Boxes and Bandwidth Generously Provided by pair Networks
"be consistent"
 
PerlMonks  

Re^4: How to deal with malformed utf8 from XML parsing

by ribasushi (Monk)
on Jan 09, 2008 at 23:31 UTC ( #661528=note: print w/ replies, xml ) Need Help??


in reply to Re^3: How to deal with malformed utf8 from XML parsing
in thread How to deal with malformed utf8 from XML parsing

Well today was definitely a fruitful day. I learned about C0/C1 control codes which were the reason for google to complain. I also realized where this stuff actually comes from (someone pasting a mis-encoded chunk of text into a browser window). Finally I know not to use is_utf8 anymore :)
Thank you for your comments.

P.S.How I ended up fixing this:

$_ =~ s/[\x{80}-\x{9F}]/\x{FFFD}/g;


Comment on Re^4: How to deal with malformed utf8 from XML parsing
Download Code

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://661528]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others pondering the Monastery: (14)
As of 2015-07-02 19:13 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    The top three priorities of my open tasks are (in descending order of likelihood to be worked on) ...









    Results (44 votes), past polls