Yes, I am aware of XML::Twig, but it is not suitable to my needs (or at leat I did not see how I could use it, because I need to "patch" an already parsed element to adjust its value during the parsing ans split of a big block of elements that I prefer not to keep in memory)
As you mention yourself in your results, the different SAX parsers are not consistent in regard to the SAX events, at least for XML::SAX::Expat that includes the encoding into start_document() data instead of xml_decl() data or XML::SAX::PurePerl that does not notify xml_decl() at all
Also I do not get the same results as you with my test program and data. Could you check for what file XML::LibXML::SAX manages to give you an encoding? You can see it does not with my utf-8 sample.
data.xml
<?xml version="1.0" encoding="UTF-8" ?>
<root>
<foo>
<bar attr="baz">héhé mes 2 €</bar>
<baz other="dummy"/>
</foo>
</root>
test_sax.xml
use strict;
use warnings;
use feature 'say';
#~ use Say; #portability trick for 5.8.8
use XML::SAX::ParserFactory;
use XML::SAX::Writer;
my $input = $ARGV[0] or die "usage: $0 <file.xml> [parser_package]";
$XML::SAX::ParserPackage = $ARGV[1] if $ARGV[1];
my $output; #just for not outputting to STDOUT
my $writer = new XML::SAX::Writer(Output => \$output);
my $handler = new SaxHandler( Handler => $writer );
my $parser = XML::SAX::ParserFactory->parser( Handler => $handler );
say sprintf "parser is %s (%s)", ref $parser, $parser->VERSION ;
$parser->parse_file($input);
{
package SaxHandler;
use base 'XML::SAX::Base';
use Data::Printer {indent=>2};
use feature 'say';
#~ use Say; #portability trick for 5.8.8
sub xml_decl {
my ($self, $decl) = @_;
say "decl ", np $decl;
$self->SUPER::xml_decl($decl);
}
sub start_document {
my ($self, $doc) = @_;
say "document ", np $doc;
$self->SUPER::start_document($doc);
}
sub start_element {
my ($self, $el) = @_;
#~ say "start element " . $el->{LocalName};
$self->SUPER::start_element($el);
}
}
my results:
macbookseb:perl seb$ perl -v
This is perl 5, version 22, subversion 1 (v5.22.1) built for darwin-th
+read-multi-2level[...]
macbookseb:perl seb$ perl test_sax.pl data.xml XML::SAX::PurePerl
parser is XML::SAX::PurePerl (0.99)
document \ {}
macbookseb:perl seb$ perl test_sax.pl data.xml XML::SAX::Expat
parser is XML::SAX::Expat (0.51)
document \ {
Encoding "UTF-8",
Standalone "",
Version 1.0
}
macbookseb:perl seb$ perl test_sax.pl data.xml XML::LibXML::SAX
parser is XML::LibXML::SAX (2.0124)
document \ {}
decl \ {
Version 1.0
}
macbookseb:perl seb$ perl test_sax.pl data.xml XML::LibXML::SAX::Parse
+r
parser is XML::LibXML::SAX::Parser (2.0124)
document \ {}
decl \ {
Encoding "UTF-8",
Version 1.0
}
|