http://www.perlmonks.org?node_id=1104450


in reply to Re: Best XML library to validate XML from untrusted source
in thread Best XML library to validate XML from untrusted source

XML::LibXML::Reader is way too low-level and while the pull style tends to lead to a (very slightly) more readable code than ordinary, node-level push, it's still nothing I would dare to recommend ... to anyone.

XML::Rules and XML::Twig give you the file in bite sized chunks which IMNSHO works much better than forcing a decomposition to individual atoms.

Speaking of XML::Rules ... it's based on XML::Parser::Expat and allows setting its handlers so I think setting the Expat's ExternEnt to your handler should provide vsespb with the protection he's after.

Jenda
Enoch was right!
Enjoy the last years of Rome.

  • Comment on Re^2: Best XML library to validate XML from untrusted source

Replies are listed 'Best First'.
Re^3: Best XML library to validate XML from untrusted source
by ikegami (Pope) on Oct 20, 2014 at 15:16 UTC

    I have no idea why you wouldn't recommend

    use XML::LibXML::Reader qw( ); my $reader = XML::LibXML::Reader->new( location => $file_or_url, load_ext_dtd => 0, expand_entities => 0, ); 1 while $reader->read;

    Wrapping this up just so you get something you can call higher-level simply is pure waste.

      Say, because it doesn't do anything? I mean, yes, it does some kind of basic format validation, but once you actually need to extract some data out of the file, things start getting complicated very quickly.

      Jenda
      Enoch was right!
      Enjoy the last years of Rome.

        Nothing? What are you talking about? It validates the file as the OP requested.

        ...Actually, that's now what he requested. He requested

        use XML::LibXML::Reader qw( XML_READER_TYPE_ENTITY_REFERENCE ); my $reader = XML::LibXML::Reader->new( location => $file_or_url, load_ext_dtd => 0, expand_entities => 0, ); while ($reader->read) { die if $reader->nodeType == XML_READER_TYPE_ENTITY_REFERENCE; }

        What exactly is the problem???