Beefy Boxes and Bandwidth Generously Provided by pair Networks
Think about Loose Coupling
 
PerlMonks  

XML::Twig Support for Parameter Entities

by dtdattacks (Initiate)
on Aug 12, 2015 at 18:30 UTC ( [id://1138328]=perlquestion: print w/replies, xml ) Need Help??

dtdattacks has asked for the wisdom of the Perl Monks concerning the following question:

Hi,

Background:

I'm currently researching the default behaviour of several XML Parsers, including XML::Twig, and possible countermeasures to DTD Attacks, like XXE and Parameter Entity XXE.

Technical Background:

I have installed Strawberry Perl v.5.22.0.1 on Windows 7 and therefore use
$XML::Parser::VERSION -> 2.4
$XML::Twig::VERSION; -> 3.49
The output of running /t/zz_dump_config.t from the downloaded tar.gz from http://www.xmltwig.org/xmltwig/ is
Configuration: perl: 5.022000 OS: MSWin32 - MSWin32 required XML::Parser : 2.44 Der Befehl "xmlwf" ist entweder falsch geschrieben oder konnte nicht gefunden werden. expat : <no version information found> Strongly Recommended Scalar::Util : 1.42 (for improved memory man +agement) Encode : 2.73 (for encoding conversion +s) Modules providing additional features XML::XPathEngine : <not available> (to use XML::Twig::XPath +) XML::XPath : <not available> (to use XML::Twig::XPath + if Tree::XPathEngine not available) LWP : 6.13 (for the parseurl method +) HTML::TreeBuilder : 5.03 (to use parse_html and p +arsefile_html) HTML::Entities::Numbered : <not available> (to allow parsing of HTM +L containing named entities) HTML::Tidy : <not available> (to use parse_html and p +arsefile_html with the use_tidy option) HTML::Entities : 3.69 (for the html_encode fil +ter) Tie::IxHash : <not available> (for the keep_atts_order + option) Text::Wrap : 2013.0523 (to use the "wrapped" op +tion for pretty_print) Modules used only by the auto tests Test : 1.26 Test::Pod : 1.50 XML::Simple : 2.20 XML::Handler::YAWriter : <not available> XML::SAX::Writer : <not available> XML::Filter::BufferText : <not available> IO::Scalar : 2.111 IO::CaptureOutput : 1.1104 Please add this information to bug reports (you can run t\zz_dump_conf +ig.t to get it) if you are upgrading the module from a previous version, make sure you + read the Changes file for bug fixes, new features and the occasional COMPATIBIL +ITY WARNING 1..1 ok 1

Problem description:

Here is the XML Document:
<?xml version="1.0" encoding="utf-8"?> <!DOCTYPE data [ <!ENTITY % start "<![CDATA["> <!ENTITY % goodies SYSTEM "C:/code/xml_files_windows/xxe.txt"> <!ENTITY % end "]]>"> <!ENTITY % dtd SYSTEM "http://127.0.0.1:5000/parameterEntity_core.dtd" +> %dtd; ]> <data>&all;</data>
The file parameterEntity_core.dtd:
<!ENTITY all '%start;%goodies;%end;'>
The file xxe.txt
it_works
I have a local webserver running on port 5000 where the corresponding files are hosted.
I parse the XML document using the ParseParamEnt feature.
#!/usr/bin/perl use strict; use XML::Twig; my $t= XML::Twig->new(); $t->parsefile('../../xml_files_windows/parameterEntity_core.xml', Pars +eParamEnt => 1); my $root = $t->root; my $content = $root->first_child->text; print $content;
However the parser quits with an exception stating
cannot expand %dtd; - cannot load 'http://127.0.0.1:5000/parameterEnti +ty_core.dtd' at C:/Strawberry/perl/vendor/lib/XML/Parser/Expat.pm lin +e 474.

Investigation

I have investigated the problem by "trial-and-error". Following are my results:
I have checked the twig.pm source code for that specific message and found it method "_twig_extern_ent" in line 2863.
I input the source code here for convenience.
sub _twig_extern_ent { # warn " in _twig_extern_ent...I (", $_[0]->original_string, ")\n +"; # DEBUG handler my( $p, $base, $sysid, $pubid)= @_; my $t= $p->{twig}; if( $t->{twig_no_expand}) { my $ent_name= $t->{twig_keep_encoding} ? $p->original_string : + $p->recognized_string; _twig_insert_ent( $t, $ent_name); return ''; } my $ent_content= eval { $t->{twig_ext_ent_handler}->( $p, $base, $ +sysid) }; if( ! defined $ent_content) { my $ent_name = $p->recognized_string; my $file = _based_filename( $sysid, $base); my $error_message= "cannot expand $ent_name - cannot load '$fi +le'"; if( $t->{twig_extern_ent_nofail}) { return "<!-- $error_messag +e -->"; } else { _croak( $error_message); + } } return $ent_content; }
It seems to me that the error message is generated depending on the fact if the variable "$ent_content" is defined.
A search with Twig.pm revealed that "twig_ext_ent_handler" is defined in only three more places within method "new" in line 538 - 545.
The code follows:
if( !$args{NoLWP} && ! _use( 'URI') && ! _use( 'URI::File') && ! _u +se( 'LWP')) { $self->{twig_ext_ent_handler}= \&XML::Parser::initial_ext_ent_ +handler } elsif( $args{NoXxe}) { $self->{twig_ext_ent_handler}= sub { my($xp, $base, $path) = @_; $xp->{ErrorMessage}.= "can +not use entities in document when the no_xxe option is on"; return un +def; }; } else { $self->{twig_ext_ent_handler}= \&XML::Parser::file_ext_ent_han +dler }
Now I checked those handlers "initial_ext_ent_handler" and "file_ext_ent_handler" in Parser.pm. As the name indicates "file_ext_ent_handler" does not seem to be able to handle resources provided by http.
Here follow the results of my testing:

Test 1

1) I call XML::Twig using the following code.
$t->parsefile('../../xml_files_windows/parameterEntity_core.xml');
At first I use no optional parameters/features.
2) I change line 538 in Twig.pm from
if( !$args{NoLWP} && ! _use( 'URI') && ! _use( 'URI::File') && ! _use( + 'LWP'))
to
if( !$args{NoLWP} )

Result:

I get no error message and the output of my element is "&all", which is ok, because Parameter Entities are obviously not enabled yet.

Test 2

1) I call XML::Twig using the ParseParamEnt feature
$t->parsefile('../../xml_files_windows/parameterEntity_core.xml', Pars +eParamEnt => 1);
2) The code line 538 is left in the modified version.
if( !$args{NoLWP} )

Result:

The parameter entities are resolved and the replacement text of the internal general entity is inserted "it_works"
Note: I also did some tests modifying the NoLWP parameter.

Test 3:

1) I call XML::Twig using the ParseParamEnt feature
$t->parsefile('../../xml_files_windows/parameterEntity_core.xml', Pars +eParamEnt => 1);
2) For each test run I change the condition to a), b) and c). a)
if( !$args{NoLWP} && ! _use( 'URI') )

b)
if( !$args{NoLWP} && ! _use( 'URI::File') )

c)
if( !$args{NoLWP} && ! _use( 'LWP'))

Result:

Irregardless of the fact which option I choose (a),b), c)) the exception "cannot expand %dtd; - cannot load 'http://127.0.0.1:5000/parameterEntity_core.dtd'" is raised.

Test 4:

1) I change the modified condition back to its original state
if( !$args{NoLWP} && ! _use( 'URI') && ! _use( 'URI::File') && ! _use( + 'LWP'))
2) I change the handler in both cases to "file_ext_ent_handler" Modified code:
if( !$args{NoLWP} && ! _use( 'URI') && ! _use( 'URI::File') && ! _ +use( 'LWP')) { $self->{twig_ext_ent_handler}= \&XML::Parser::file_ext_ent_han +dler }
3) I call XML::Twig using the ParseParamEnt feature
$t->parsefile('../../xml_files_windows/parameterEntity_core.xml', Pars +eParamEnt => 1);

Result:

The exception "cannot expand %dtd; - cannot load 'http://127.0.0.1:5000/parameterEntity_core.dtd'" is raised.
Assumption:
The problem seems to be the file_ext_ent_handler. Therefore I suspect that always the third "else" is executed.

Test 5 (Verify Assumption)

1) I swap the handlers
Modified code:
if( !$args{NoLWP} && ! _use( 'URI') && ! _use( 'URI::File') && ! +_use( 'LWP')) { $self->{twig_ext_ent_handler}= \&XML::Parser::file_ext_ent_han +dler } elsif( $args{NoXxe}) { $self->{twig_ext_ent_handler}= sub { my($xp, $base, $path) = @_; $xp->{ErrorMessage}.= "can +not use entities in document when the no_xxe option is on"; return un +def; }; } else { $self->{twig_ext_ent_handler}= \&XML::Parser::initial_ext_ent_ +handler }
2) I call XML::Twig using the ParseParamEnt feature
$t->parsefile('../../xml_files_windows/parameterEntity_core.xml', Pars +eParamEnt => 1);

Result:

The replacement text "it_works" is inserted in the element
This verifies my assumption.

Conclusion:

I conclude that XML::Twig always uses a file handler from XML::Parser on my system and no URLs can be parsed.

Question

Why is it that this condition
if( !$args{NoLWP} && ! _use( 'URI') && ! _use( 'URI::File') && ! _u +se( 'LWP'))
is never true on my system and a handler is used which does not support http?
How can I make XML::Twig use the "initial_ext_ent_handler"?
Thanks for your patience and I'm looking forward to your replies and suggestions.
Chris

Replies are listed 'Best First'.
Re: XML::Twig Support for Parameter Entities (installed?)
by tye (Sage) on Aug 13, 2015 at 03:45 UTC

    Do you have the module URI or URI::File or LWP installed? The code appears to say that you don't have any of them. I tried to search for where you said whether or not you had these modules installed, but my searches failed (and I didn't feel like reading every single word of what you wrote -- so I apologize in advance if you did actually say that somewhere).

    - tye        

      Do you have the module URI or URI::File or LWP installed?...

      :) I thought of that too, op said I have installed Strawberry Perl v.5.22.0.1 on Windows 7 and therefore use so http://strawberryperl.com/release-notes/5.22.0.1-32bit.html

      they are installed

      XML::Twig docs warn that these parts are not well tested, heck they're not even documented

        I did look at Corelist and got the impression that none of these are "core" modules. But I didn't look at what non-"core" modules Strawberry includes by default.

        Looking at the code for _use():

        sub _use ## no critic (Subroutines::ProhibitNestedSubs); { my( $module, $version)= @_; ... if( eval "require $module") { ...

        Perhaps they have something loaded that is changing the return value from 'require'? It is pretty darn rare for people to care about the return value from a 'require', which is why I'd write such code as:

        if( eval "require $module; 1" ) { ...

        I'd also try "require LWP" and see what Perl says in response. It would be good to change Twig so that $@ from one of the _use() calls would be appended to the "cannot expand %dtd; - cannot load 'http://127.0.0.1:5000/parameterEntity_core.dtd'" error messages. [ People often seem to underrate the value of error messages. :) ]

        - tye        

Re: XML::Twig Support for Parameter Entities
by Anonymous Monk on Aug 13, 2015 at 00:59 UTC
      I did a quick check but did not find any matches.
      To which one are you referring to?

        I did a quick check but did not find any matches. To which one are you referring to?

        :) send a bug report to the author, hes quite responsive

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://1138328]
Approved by stevieb
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others romping around the Monastery: (2)
As of 2024-04-20 04:11 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found