If you don't have a file called Latin-1.enc under your XML/Parser/Encodings directory, get it from somewhere or make one for yourself. If you already have it, you are ready to go now.
Actually there is no such file in the Encodings directory and there is no need for one. ISO-8859-1 is understood by expat natively:
From XML::Parser doc:
ProtocolEncoding
This is an Expat option. This sets the protocol encoding name.
It defaults to none. The built-in encodings are: "UTF-8",
"ISO-8859-1", "UTF-16", and "US-ASCII". Other encodings may be
used if they have encoding maps in one of the directories in
the @Encoding_Path list. Check the section on "ENCODINGS" for
more information on encoding maps. Setting the protocol encod-
ing overrides any encoding in the XML declaration.
| [reply] |
Please, please, please do not use the ProtocolEncoding option. As mirod said, if your source XML document a) does not declare an encoding and b) is not UTF8 (or UTF16) encoded, then it is not XML! The two preferred options are:
- If you are generating the XML, then you need to include an XML declaration which specifies the encoding
- If the XML is being generated by someone else, then you need to reject it since it is not well formed.
Sure, you might guess that the encoding is ISO-8859-1 and it might seem to work if you force it with ProtocolEncoding, but the encoding might actually be CP1252 and the differences haven't tripped you up - yet.
The encodings section of the Perl XML FAQ may be useful.
| [reply] |
| [reply] |
Any advice on where to find these protocol/encoding sections, or how they should look?
I spend a lot of time tacking on the headers as suggested earlier in the thread, and I'd like to learn a little more about how expat and XML::Parser deal with encodings -- specifically, how they're mapped.
Suggestions? | [reply] |