Beefy Boxes and Bandwidth Generously Provided by pair Networks
good chemistry is complicated,
and a little bit messy -LW
 
PerlMonks  

XML::LibXML complains

by Skeeve (Parson)
on Jun 06, 2014 at 13:16 UTC ( [id://1089023]=perlquestion: print w/replies, xml ) Need Help??

Skeeve has asked for the wisdom of the Perl Monks concerning the following question:

Previously I had used (long, long time ago) XML::Twig for parsing XML. Or XML::Simple.

Now I read that XML::Simple shouldn't be used in new code and so I wanted to try the named alternative XML::LibXML.

But I already failed with the simplest code:

use strict; use warnings; use XML::LibXML; my $doc = XML::LibXML->load_xml( location => shift, validation => 0, ); print "Done\n"; use Data::Dumper; print Dumper $doc;

When I leave out the DOCTYPE, it seems to parse, but as soon as I have

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">

my script stops with:

http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd:1: parser error : Co +ntent error in the external subset HTTP/1.0 500 Server Error

I wouldn't want to remove the doctype, so I'm wondering what could be wrong here. And is there a way to make the parser NOT go out and retrieve the DTD?


s$$([},&%#}/&/]+}%&{})*;#$&&s&&$^X.($'^"%]=\&(|?*{%
+.+=%;.#_}\&"^"-+%*).}%:##%}={~=~:.")&e&&s""`$''`"e

Replies are listed 'Best First'.
Re: XML::LibXML complains
by ikegami (Patriarch) on Jun 06, 2014 at 16:51 UTC

    Better yet, the following provides a local copy of the HTML/XHTML DTDs and has XML::LibXML use them instead of trying to download them.

    use XML::Catalogs::HTML -libxml;

    no_network => 1 is a good option to pass to XML::LibXML.

Re: XML::LibXML complains
by Skeeve (Parson) on Jun 06, 2014 at 13:52 UTC

    Found it…

    my $doc = XML::LibXML->load_xml( location => shift, validation => 0, load_ext_dtd => 0, # <- That's the key );

    s$$([},&%#}/&/]+}%&{})*;#$&&s&&$^X.($'^"%]=\&(|?*{%
    +.+=%;.#_}\&"^"-+%*).}%:##%}={~=~:.")&e&&s""`$''`"e
      Thanks so much. That indeed worked for me.
Re: XML::LibXML complains
by taint (Chaplain) on Jun 06, 2014 at 14:07 UTC
    Greetings, Skeeve.

    I should preface this by stating I haven't used XML::LibXML. That said. Where retrieving the DTD is concerned. As I read it, from the pod, and from your example code, it's (load_xml) attempting to load a File Handle. So I think that's what it's choking on.

    In other words, it's looking for the file you want to parse/manipulate.

    FWIW, it's also possible (where the DTD is concerned) to modify the location

    <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
    could be written as
    <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://path/to/my/personal/local/DTD/my-strict-xhtml.dtd">

    Best wishes.

    --Chris

    UPDATE: OOP's looks like the solution was found while I composed my response. :P

    ¡λɐp ʇɑəɹ⅁ ɐ əʌɐɥ puɐ ʻꜱdləɥ ꜱᴉɥʇ ədoH

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://1089023]
Approved by marto
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others having a coffee break in the Monastery: (6)
As of 2024-04-19 08:24 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found