Beefy Boxes and Bandwidth Generously Provided by pair Networks
laziness, impatience, and hubris
 
PerlMonks  

Re: XML::Parser Encoding (UTF-8 -> ISO-8859-1)

by jkahn (Friar)
on Sep 12, 2002 at 00:49 UTC ( #197126=note: print w/ replies, xml ) Need Help??


in reply to XML::Parser Encoding (UTF-8 -> ISO-8859-1)

I'm not an expert on Text::Iconv (I prefer Unicode::String, it seems to be more portable) but I know a little about Unicode encodings.

A couple of notes, some of which may be relevant. Forgive me if you know all this -- I figured it might be useful to somebody who's looking for this kind of information, even if it doesn't necessarily help Emanuel:

  1. ISO-8859-1 doesn't have the richness to encode every possible character from UTF-8. Many Eastern European characters (not to mention South and East Asian characters) cannot be encoded in ISO-8859-1. There just aren't enough bits. Are there characters in your data outside the range (U+0000 .. U+00FF) ?
  2. Perl's internals are in UTF-8, so the fact that print outputs correct-looking data may be because Text::Iconv is not doing it's job correctly (or you're not using the feature the way it's intended). In other words, if your data is *still* in UTF-8, then it will probably print correctly.
  3. If you're using .nix or cygwin, then you probably have the od (octal dump) tool available, which I find indispensable for determining codeset issues (editors like vi and emacs tend to operate at too high a level, because they try to interpret the encoding for you and things "look fine" even when they're in the wrong encoding). I use:od -a all the time to figure out whether I've used encoding tools correctly.

HTH, jkahn


Comment on Re: XML::Parser Encoding (UTF-8 -> ISO-8859-1)
Select or Download Code
Re: Re: XML::Parser Encoding (UTF-8 -> ISO-8859-1)
by Emanuel (Pilgrim) on Sep 12, 2002 at 01:28 UTC
    Hello, thanks for the reply

    It has to be ISO-8859-1, or Latin-1. I don't have any Asian Characters in it.

    The second point mention could be true I guess. Think i'll have to do more digging into that. Hope i'll find something this way, although I thought I'm doing it the right way, I might easily be wrong.

    I've been using hexdump and calculated to octal, didn't know about od.. you never stop learning.

    thanks for your reply, i'll get working on Text::Iconv.

    Emanuel
    Edit:
    Here's a sample output, i quickly hacked in..:

    Before Conversion: Live Fußball: Bundesliga, 3. Spieltag
    After Conversion: Live Fußball: Bundesliga, 3. Spieltag

    dumped it to a file, checked it with od and hexdump, but it looks correct.. still in the dB it appears as in Bevore Conversion.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://197126]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others pondering the Monastery: (5)
As of 2015-07-04 21:10 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    The top three priorities of my open tasks are (in descending order of likelihood to be worked on) ...









    Results (60 votes), past polls