I don't get your behaviour.
$ perl -MEncode -e'
print encode "UTF-8",
qq{<?xml version="1.0" encoding="UTF-8"?>\n} .
qq{<Name>Issu\x{E9}T\x{E9}st</Name>\n};
' >in.xml
$ perl -e'
use open ":std", ":encoding(UTF-8)"; # I have a UTF-8 terminal
use XML::DOM;
my $parser = XML::DOM::Parser->new();
my $doc = $parser->parsefile("in.xml");
print $doc->toString();
'
<?xml version="1.0" encoding="UTF-8"?>
<Name>IssuéTést</Name>
I tried mis-encoding the XML to see if I could get your behaviour, but I don't get your behaviour even then.
$ perl -MEncode -e'
print encode "iso-8859-1", # Wrong!
qq{<?xml version="1.0" encoding="UTF-8"?>\n} .
qq{<Name>Issu\x{E9}T\x{E9}st</Name>\n};
' >in.xml
$ perl -e'
use open ":std", ":encoding(UTF-8)"; # I have a UTF-8 terminal
use XML::DOM;
my $parser = XML::DOM::Parser->new();
my $doc = $parser->parsefile("in.xml");
print $doc->toString();
'
not well-formed (invalid token) at line 2, column 10, byte 49 at .../X
+ML/Parser.pm line 187
Either your file doesn't contain what you say it does, or there was a bug that's been fixed. Try upgrading XML::DOM and its dependencies. Versions I used:
- XML::DOM 1.44
- XML::RegExp 0.02
- XML::Parser 2.41
- XML::Parser::Expat 2.41