Beefy Boxes and Bandwidth Generously Provided by pair Networks
XP is just a number
 
PerlMonks  

Re^2: Filtering out bad UTF8 chars

by FreakyGreenLeaky (Sexton)
on Oct 13, 2011 at 09:55 UTC ( #931184=note: print w/ replies, xml ) Need Help??


in reply to Re: Filtering out bad UTF8 chars
in thread Filtering out bad UTF8 chars

Thanks for the reply ikegami - I then get a Cannot decode string with wide characters at /usr/lib64/perl5/Encode.pm line 174. error, presumably because the text is already decoded, and I'm double-decoding (if I understand correctly).

My problem is I have input from wildly varying sources (websites) with correspondingly wildly varying encodings...

I think until I can find a way to handle these scenarios without crashing the backend, I'm going to not try and extract what can be extracted and simply skip these damn files.

Luckily they're in the extreme minority and as much as it irks me to do this, I'm flagging this #TODO for now.


Comment on Re^2: Filtering out bad UTF8 chars
Download Code
Re^3: Filtering out bad UTF8 chars
by ikegami (Pope) on Oct 13, 2011 at 14:49 UTC

    My problem is I have input from wildly varying sources (websites) with correspondingly wildly varying encodings...

    But you asked about bad UTF-8?! Sorry, I don't understand your question at all.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://931184]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others imbibing at the Monastery: (9)
As of 2014-08-30 07:42 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    The best computer themed movie is:











    Results (291 votes), past polls