Beefy Boxes and Bandwidth Generously Provided by pair Networks
"be consistent"
 
PerlMonks  

Re: XML Simple Charset Q?

by pg (Canon)
on Nov 25, 2002 at 18:17 UTC ( [id://215687]=note: print w/replies, xml ) Need Help??


in reply to XML Simple Charset Q?

Yes, you can use umlauts in your xml, and XML::Parser is okay with them. Just do two things:
  1. When you new your XML::Parser, specify  ProtocolEncoding => "Latin-1"
  2. If you don't have a file called Latin-1.enc under your XML/Parser/Encodings directory, get it from somewhere or make one for yourself. If you already have it, you are ready to go now.

Replies are listed 'Best First'.
Re: Re: XML Simple Charset Q?
by mirod (Canon) on Nov 25, 2002 at 18:32 UTC
    If you don't have a file called Latin-1.enc under your XML/Parser/Encodings directory, get it from somewhere or make one for yourself. If you already have it, you are ready to go now.

    Actually there is no such file in the Encodings directory and there is no need for one. ISO-8859-1 is understood by expat natively:

    From XML::Parser doc:

    ProtocolEncoding
                   This is an Expat option. This sets the protocol encoding name.
                   It defaults to none. The built-in encodings are: "UTF-8",
                   "ISO-8859-1", "UTF-16", and "US-ASCII". Other encodings may be
                   used if they have encoding maps in one of the directories in
                   the @Encoding_Path list. Check the section on "ENCODINGS" for
                   more information on encoding maps. Setting the protocol encod-
                   ing overrides any encoding in the XML declaration.
    
Re: Re: XML Simple Charset Q?
by grantm (Parson) on Nov 25, 2002 at 22:25 UTC

    Please, please, please do not use the ProtocolEncoding option. As mirod said, if your source XML document a) does not declare an encoding and b) is not UTF8 (or UTF16) encoded, then it is not XML! The two preferred options are:

    • If you are generating the XML, then you need to include an XML declaration which specifies the encoding
    • If the XML is being generated by someone else, then you need to reject it since it is not well formed.

    Sure, you might guess that the encoding is ISO-8859-1 and it might seem to work if you force it with ProtocolEncoding, but the encoding might actually be CP1252 and the differences haven't tripped you up - yet.

    The encodings section of the Perl XML FAQ may be useful.

Re: Re: XML Simple Charset Q?
by dingus (Friar) on Nov 25, 2002 at 18:29 UTC
    1. Where the heck do I find a latin-1.enc file? google is ot my friend right now :(

    2. Does this end up with UTF-8 output anyway? - see my update to my reply to mirod above.

    Dingus


    Enter any 47-digit prime number to continue.
Re: Re: XML Simple Charset Q?
by jkahn (Friar) on Nov 25, 2002 at 18:31 UTC
    Any advice on where to find these protocol/encoding sections, or how they should look?

    I spend a lot of time tacking on the headers as suggested earlier in the thread, and I'd like to learn a little more about how expat and XML::Parser deal with encodings -- specifically, how they're mapped.

    Suggestions?

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://215687]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others romping around the Monastery: (5)
As of 2024-04-26 08:40 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found