Beefy Boxes and Bandwidth Generously Provided by pair Networks
No such thing as a small change
 
PerlMonks  

Re: Best XML library to validate XML from untrusted source

by ikegami (Pope)
on Oct 19, 2014 at 17:54 UTC ( #1104326=note: print w/replies, xml ) Need Help??


in reply to Best XML library to validate XML from untrusted source

I don't see how XXE relates to doing XML validation that couldn't be addressed by limiting memory and CPU usage (which you would need to do either way), but you can nullify it by using XML::LibXML's load_ext_dtd and expand_entities options (as mentioned in the document you linked, under the heading "libxml2").

Rather than loading the entire document into memory, you'd want to use XML::LibXML's pull interface, XML::LibXML::Reader.

$ cat a.xml <?xml version="1.0" encoding="ISO-8859-1"?> <!DOCTYPE foo [ <!ELEMENT foo ANY > <!ENTITY xxe SYSTEM "file:///etc/passwd" >]><foo>&xxe;</foo> $ perl -MXML::LibXML::Reader -e' my $reader = XML::LibXML::Reader->new( location => $ARGV[0], load_ext_dtd => 0, expand_entities => 0, ); while ($reader->read) { printf("%d %d %s\n", $reader->depth, $reader->nodeType, $reader->name, ); } ' a.xml 0 10 foo 0 1 foo 1 5 xxe 0 15 foo
$ cat bad.xml <foo><bar></foo> $ perl -MXML::LibXML::Reader -e' exit 1 if !eval { my $reader = XML::LibXML::Reader->new( location => $ARGV[0], load_ext_dtd => 0, expand_entities => 0, ); 1 while $reader->read; 1 }; ' bad.xml $ if [ $? -eq 0 ]; then echo "well-formed" ; else echo "error" ; fi error

Replies are listed 'Best First'.
Re^2: Best XML library to validate XML from untrusted source
by Jenda (Abbot) on Oct 20, 2014 at 15:07 UTC

    XML::LibXML::Reader is way too low-level and while the pull style tends to lead to a (very slightly) more readable code than ordinary, node-level push, it's still nothing I would dare to recommend ... to anyone.

    XML::Rules and XML::Twig give you the file in bite sized chunks which IMNSHO works much better than forcing a decomposition to individual atoms.

    Speaking of XML::Rules ... it's based on XML::Parser::Expat and allows setting its handlers so I think setting the Expat's ExternEnt to your handler should provide vsespb with the protection he's after.

    Jenda
    Enoch was right!
    Enjoy the last years of Rome.

      I have no idea why you wouldn't recommend

      use XML::LibXML::Reader qw( ); my $reader = XML::LibXML::Reader->new( location => $file_or_url, load_ext_dtd => 0, expand_entities => 0, ); 1 while $reader->read;

      Wrapping this up just so you get something you can call higher-level simply is pure waste.

        Say, because it doesn't do anything? I mean, yes, it does some kind of basic format validation, but once you actually need to extract some data out of the file, things start getting complicated very quickly.

        Jenda
        Enoch was right!
        Enjoy the last years of Rome.

Re^2: Best XML library to validate XML from untrusted source
by vsespb (Chaplain) on Oct 23, 2014 at 12:14 UTC
    Thank you for all your replies! Very useful. One note:
    I don't see how XXE relates to doing XML validation that couldn't be addressed by limiting memory and CPU usage (which you would need to do either way),
    XXE not just about DoS. For example I have API which accepts requests over XML.
    There is an API function: create object (with user supplied name). And another function: list all objects with its names.
    So attacker can create object with name equal to content of /etc/passwd and then list it, this way receive content of /etc/passwd.
    imho, pretty common case...

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://1104326]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others taking refuge in the Monastery: (6)
As of 2019-12-13 11:21 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found

    Notices?