Hi,
Background:
I'm currently researching the default behaviour of several XML Parsers, including XML::Twig, and possible countermeasures to DTD Attacks, like XXE and Parameter Entity XXE.
Technical Background:
I have installed Strawberry Perl v.5.22.0.1 on Windows 7 and therefore use
$XML::Parser::VERSION -> 2.4
$XML::Twig::VERSION; -> 3.49
The output of running /t/zz_dump_config.t from the downloaded tar.gz from http://www.xmltwig.org/xmltwig/ is
Configuration:
perl: 5.022000
OS: MSWin32 - MSWin32
required
XML::Parser : 2.44
Der Befehl "xmlwf" ist entweder falsch geschrieben oder
konnte nicht gefunden werden.
expat : <no version information found>
Strongly Recommended
Scalar::Util : 1.42 (for improved memory man
+agement)
Encode : 2.73 (for encoding conversion
+s)
Modules providing additional features
XML::XPathEngine : <not available> (to use XML::Twig::XPath
+)
XML::XPath : <not available> (to use XML::Twig::XPath
+ if Tree::XPathEngine not available)
LWP : 6.13 (for the parseurl method
+)
HTML::TreeBuilder : 5.03 (to use parse_html and p
+arsefile_html)
HTML::Entities::Numbered : <not available> (to allow parsing of HTM
+L containing named entities)
HTML::Tidy : <not available> (to use parse_html and p
+arsefile_html with the use_tidy option)
HTML::Entities : 3.69 (for the html_encode fil
+ter)
Tie::IxHash : <not available> (for the keep_atts_order
+ option)
Text::Wrap : 2013.0523 (to use the "wrapped" op
+tion for pretty_print)
Modules used only by the auto tests
Test : 1.26
Test::Pod : 1.50
XML::Simple : 2.20
XML::Handler::YAWriter : <not available>
XML::SAX::Writer : <not available>
XML::Filter::BufferText : <not available>
IO::Scalar : 2.111
IO::CaptureOutput : 1.1104
Please add this information to bug reports (you can run t\zz_dump_conf
+ig.t to get it)
if you are upgrading the module from a previous version, make sure you
+ read the
Changes file for bug fixes, new features and the occasional COMPATIBIL
+ITY WARNING
1..1
ok 1
Problem description:
Here is the XML Document:
<?xml version="1.0" encoding="utf-8"?>
<!DOCTYPE data [
<!ENTITY % start "<![CDATA[">
<!ENTITY % goodies SYSTEM "C:/code/xml_files_windows/xxe.txt">
<!ENTITY % end "]]>">
<!ENTITY % dtd SYSTEM "http://127.0.0.1:5000/parameterEntity_core.dtd"
+>
%dtd;
]>
<data>&all;</data>
The file parameterEntity_core.dtd:
<!ENTITY all '%start;%goodies;%end;'>
The file xxe.txt
it_works
I have a local webserver running on port 5000 where the corresponding files are hosted.
I parse the XML document using the ParseParamEnt feature.
#!/usr/bin/perl
use strict;
use XML::Twig;
my $t= XML::Twig->new();
$t->parsefile('../../xml_files_windows/parameterEntity_core.xml', Pars
+eParamEnt => 1);
my $root = $t->root;
my $content = $root->first_child->text;
print $content;
However the parser quits with an exception stating
cannot expand %dtd; - cannot load 'http://127.0.0.1:5000/parameterEnti
+ty_core.dtd' at C:/Strawberry/perl/vendor/lib/XML/Parser/Expat.pm lin
+e 474.
Investigation
I have investigated the problem by "trial-and-error". Following are my results:
I have checked the twig.pm source code for that specific message and found it method "_twig_extern_ent"
in line 2863.
I input the source code here for convenience.
sub _twig_extern_ent
{ # warn " in _twig_extern_ent...I (", $_[0]->original_string, ")\n
+"; # DEBUG handler
my( $p, $base, $sysid, $pubid)= @_;
my $t= $p->{twig};
if( $t->{twig_no_expand})
{ my $ent_name= $t->{twig_keep_encoding} ? $p->original_string :
+ $p->recognized_string;
_twig_insert_ent( $t, $ent_name);
return '';
}
my $ent_content= eval { $t->{twig_ext_ent_handler}->( $p, $base, $
+sysid) };
if( ! defined $ent_content)
{
my $ent_name = $p->recognized_string;
my $file = _based_filename( $sysid, $base);
my $error_message= "cannot expand $ent_name - cannot load '$fi
+le'";
if( $t->{twig_extern_ent_nofail}) { return "<!-- $error_messag
+e -->"; }
else { _croak( $error_message);
+ }
}
return $ent_content;
}
It seems to me that the error message is generated depending on the fact if the variable "$ent_content" is defined.
A search with Twig.pm revealed that "twig_ext_ent_handler" is defined in only three more places within method "new" in line 538 - 545.
The code follows:
if( !$args{NoLWP} && ! _use( 'URI') && ! _use( 'URI::File') && ! _u
+se( 'LWP'))
{ $self->{twig_ext_ent_handler}= \&XML::Parser::initial_ext_ent_
+handler }
elsif( $args{NoXxe})
{ $self->{twig_ext_ent_handler}=
sub { my($xp, $base, $path) = @_; $xp->{ErrorMessage}.= "can
+not use entities in document when the no_xxe option is on"; return un
+def; };
}
else
{ $self->{twig_ext_ent_handler}= \&XML::Parser::file_ext_ent_han
+dler }
Now I checked those handlers "initial_ext_ent_handler" and "file_ext_ent_handler" in Parser.pm.
As the name indicates "file_ext_ent_handler" does not seem to be able to handle resources provided by http.
Here follow the results of my testing:
Test 1
1) I call XML::Twig using the following code.
$t->parsefile('../../xml_files_windows/parameterEntity_core.xml');
At first I use no optional parameters/features.
2) I change line 538 in Twig.pm from
if( !$args{NoLWP} && ! _use( 'URI') && ! _use( 'URI::File') && ! _use(
+ 'LWP'))
to
if( !$args{NoLWP} )
Result:
I get no error message and the output of my element is "&all", which is ok, because Parameter Entities are obviously not enabled yet.
Test 2
1) I call XML::Twig using the ParseParamEnt feature
$t->parsefile('../../xml_files_windows/parameterEntity_core.xml', Pars
+eParamEnt => 1);
2) The code line 538 is left in the modified version.
if( !$args{NoLWP} )
Result:
The parameter entities are resolved and the replacement text of the internal general entity is inserted "it_works"
Note: I also did some tests modifying the NoLWP parameter.
Test 3:
1) I call XML::Twig using the ParseParamEnt feature
$t->parsefile('../../xml_files_windows/parameterEntity_core.xml', Pars
+eParamEnt => 1);
2) For each test run I change the condition to a), b) and c).
a)
if( !$args{NoLWP} && ! _use( 'URI') )
b)
if( !$args{NoLWP} && ! _use( 'URI::File') )
c)
if( !$args{NoLWP} && ! _use( 'LWP'))
Result:
Irregardless of the fact which option I choose (a),b), c)) the exception "cannot expand %dtd; - cannot load 'http://127.0.0.1:5000/parameterEntity_core.dtd'" is raised.
Test 4:
1) I change the modified condition back to its original state
if( !$args{NoLWP} && ! _use( 'URI') && ! _use( 'URI::File') && ! _use(
+ 'LWP'))
2) I change the handler in both cases to "file_ext_ent_handler"
Modified code:
if( !$args{NoLWP} && ! _use( 'URI') && ! _use( 'URI::File') && ! _
+use( 'LWP'))
{ $self->{twig_ext_ent_handler}= \&XML::Parser::file_ext_ent_han
+dler }
3) I call XML::Twig using the ParseParamEnt feature
$t->parsefile('../../xml_files_windows/parameterEntity_core.xml', Pars
+eParamEnt => 1);
Result:
The exception "cannot expand %dtd; - cannot load 'http://127.0.0.1:5000/parameterEntity_core.dtd'" is raised.
Assumption:
The problem seems to be the file_ext_ent_handler. Therefore I suspect that always the third "else" is executed.
Test 5 (Verify Assumption)
1) I swap the handlers
Modified code:
if( !$args{NoLWP} && ! _use( 'URI') && ! _use( 'URI::File') && !
+_use( 'LWP'))
{ $self->{twig_ext_ent_handler}= \&XML::Parser::file_ext_ent_han
+dler }
elsif( $args{NoXxe})
{ $self->{twig_ext_ent_handler}=
sub { my($xp, $base, $path) = @_; $xp->{ErrorMessage}.= "can
+not use entities in document when the no_xxe option is on"; return un
+def; };
}
else
{ $self->{twig_ext_ent_handler}= \&XML::Parser::initial_ext_ent_
+handler }
2) I call XML::Twig using the ParseParamEnt feature
$t->parsefile('../../xml_files_windows/parameterEntity_core.xml', Pars
+eParamEnt => 1);
Result:
The replacement text "it_works" is inserted in the element
This verifies my assumption.
Conclusion:
I conclude that XML::Twig always uses a file handler from XML::Parser on my system and no URLs can be parsed.
Question
Why is it that this condition
if( !$args{NoLWP} && ! _use( 'URI') && ! _use( 'URI::File') && ! _u
+se( 'LWP'))
is never true on my system and a handler is used which does not support http?
How can I make XML::Twig use the "initial_ext_ent_handler"?
Thanks for your patience and I'm looking forward to your replies and suggestions.
Chris