Beefy Boxes and Bandwidth Generously Provided by pair Networks
Just another Perl shrine
 
PerlMonks  

Re: Invalid UTF8 data: namespace suggestion needed or wheel reinvented?

by mirod (Canon)
on Mar 02, 2004 at 11:21 UTC ( #333219=note: print w/ replies, xml ) Need Help??


in reply to Invalid UTF8 data: namespace suggestion needed or wheel reinvented?

This seems very specific. If the data is invalid utf8 then the script has to guess what the problem is, which is highly dependent on the process that created the data in the first place.

Is the data in a different encoding? Does the script use Encode::Guess? What does it do exactly?

My first reaction would be that such a script could be quite dangerous, giving users the impression that they can just run it and have all their problems disappear, when I believe that cleaning up encodings is actually a task that is very difficult to automate in general. At the very least it should come with a huge warning and a long definition of the exact scope of the cleanup it performs


Comment on Re: Invalid UTF8 data: namespace suggestion needed or wheel reinvented?
Re: Re: Invalid UTF8 data: namespace suggestion needed or wheel reinvented?
by liz (Monsignor) on Mar 02, 2004 at 11:33 UTC
    Does the script use Encode::Guess?

    Good point. I had forgotten about that module. Will look into that.

    What does it do exactly?

    With Encode: $string = decode("utf8", $string,FB_DEFAULT )

    ... cleaning up encodings is actually a task that is very difficult to automate in general.

    I agree. But since XML is very picky about encoding errors, and the XML feed must continue in some way, this is (for now) the solution.

    It's a problem that many people (will be|are) facing when migrating legacy systems to Unicode/XML aware systems, which is why I think it warrants someting on CPAN.

    Liz

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://333219]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others exploiting the Monastery: (10)
As of 2014-08-23 15:46 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    The best computer themed movie is:











    Results (174 votes), past polls