lucdewav has asked for the wisdom of the Perl Monks concerning the following question:

Hi I don't manage to build an XML::DOM object with french characters. In the XML::DOM::Parser documentation, it says that the characters are encoded in utf-8 characterset during the parsing. The problem is that this module doesn't seem to support non-american characters,because when I convert my XML::DOM::Document object to a string, only the attributes are encoded in UTF-8 and they are not always well encoded. Moreover, when i use the: ProtocolEncoding=>'ISO-8859-1' option in the XML::DOM::Parser constructor, my output is not the same as my input. Does anybody manage to make the XML::Parser module compliant with non ASCII characters? Am I missing something? I really need your help. Here is a code example,
use XML::DOM; my $parser = new XML::DOM::Parser; my $xmlstring="<?xml version='1\.0'?> <ACTION> <INPUT LABEL=\"Radio Button\"/> <INPUT LABEL=\"t\"/> <RADIO ID=\"List\"> </RADIO> </ACTION>"; my $doc; eval { $doc = $parser->parse($xmlstring); }; if ($@) { die "ERROR : $@\n"; } print $doc->toString;
Thanks, Luc

Replies are listed 'Best First'.
Re: XML::Parser multilanguage support
by OeufMayo (Curate) on Jul 07, 2001 at 13:45 UTC

    You need to add the encoding used in the document at the start of it: <?xml version="1.0" encoding="ISO-8859-1" standalone="yes"?>. The entities are then converted to the proper numerical entities.

    You may also want to use mirod's XML::Twig, which, in the latest version (3.0) is able to keep the original encoding. And besides this cool feature, I find Twig easier to use than DOM to process XML docs. Here's what your code would look like with Twig:

    #!/usr/bin/perl -w use strict; use XML::Twig 3.0; my $parser = new XML::Twig( keep_encoding => 1 ); my $xmlstring=<<"XMLEND"; <?xml version="1.0" encoding="ISO-8859-1"?> <ACTION> <INPUT LABEL="Radio Button"/> <INPUT LABEL="t"/> <RADIO ID="List"> </RADIO> </ACTION> XMLEND $parser->parse($xmlstring); $parser->print;

    Hope this helps!

    update: version 3.0 of XML::Twig can be found here

    my $OeufMayo = new PerlMonger::Paris({http => ''});</kbd>
      thanks, i also hope your answer will help me;-) I have got another question: I also must parse my xml strings with XSL files. I used to do it with the XML::XSL module ( v0.24). This module is well integrated with the XML::DOM because you can easily parse DOM objects and get a string using the $XSLParser->transform_document($DOMobject,"DOM") method. example:
      #!/usr/bin/perl -w use strict; use XML::XSLT; use XML::DOM; ... sub applyXSLDOM { my $self=shit; my $request=shift; my $xmldom=shift; my $xsldoc="user\.xsl"; eval{ my $xslparser = XML::XSLT->new($xsldoc,"FILE"); $xslparser->transform_document($xmldom,"DOM"); }; if ($@) { return $request->error("XSL Parsing failed: $@"); } return $xslparser->result_string; }
      Would you know another perl XSL parser? Thanks a lot. Luc