Beefy Boxes and Bandwidth Generously Provided by pair Networks
No such thing as a small change

HTML::Treebuilder - DOCTYPE

by smaida (Initiate)
on Jul 11, 2005 at 11:40 UTC ( #473902=perlquestion: print w/replies, xml ) Need Help??

smaida has asked for the wisdom of the Perl Monks concerning the following question:

Hello monks,

I am working on a script that parses content from web pages and inserts the data in to a new page. The process is as follows:

  • Retrieve content from a web site (source) with WWW:Mechanize and parse table content in to a data structure.
  • Download the target web page via FTP.
  • Insert desired table content from data structure in to the target page.
  • Upload the target web page via FTP.

The problem is that I'm using HTML::TreeBuilder for both parsing the source page and inserting data in to the target page and when creating the target page the DOCTYPE is positioned after the closing body tag. (Documented bug in HTML::TreeBuilder) This is a problem for IE...

Has anyone used HTML::TreeBuilder before and found a solution to this problem?

The Doctype on the target page may very so I need to retain the original and not hardcode the doctype in to my script.



Replies are listed 'Best First'.
Re: HTML::Treebuilder - DOCTYPE
by gellyfish (Monsignor) on Jul 11, 2005 at 12:33 UTC

    See the description of the store_declarations method in the HTML::TreeBuilder documentation.


      Thanks for the quick reply gellyfish.

      Unfortunately, it's that very method that causes the problem.

      The default option in Treebuilder $tree->store_declarations(0); is to not store the declerations. When I run the program this way, the decleration is dropped altogether. When I use $tree->store_declarations(1); the doctype is inserted after the closing body tag.

      Do you have any other suggestions?


        Yes, my point was that the documentation indicates that what you are seeing is a bug and invites patches. To be honest you are probably best doing something like stripping the declaration off the returned HTML and sticking it back on the front yourself.


Log In?

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://473902]
Approved by Joost
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others surveying the Monastery: (3)
As of 2022-01-27 11:05 GMT
Find Nodes?
    Voting Booth?
    In 2022, my preferred method to securely store passwords is:

    Results (70 votes). Check out past polls.