Beefy Boxes and Bandwidth Generously Provided by pair Networks
Your skill will accomplish
what the force of many cannot
 
PerlMonks  

HTML::Treebuilder - DOCTYPE

by smaida (Initiate)
on Jul 11, 2005 at 11:40 UTC ( #473902=perlquestion: print w/replies, xml ) Need Help??

smaida has asked for the wisdom of the Perl Monks concerning the following question:

Hello monks,

I am working on a script that parses content from web pages and inserts the data in to a new page. The process is as follows:

  • Retrieve content from a web site (source) with WWW:Mechanize and parse table content in to a data structure.
  • Download the target web page via FTP.
  • Insert desired table content from data structure in to the target page.
  • Upload the target web page via FTP.

The problem is that I'm using HTML::TreeBuilder for both parsing the source page and inserting data in to the target page and when creating the target page the DOCTYPE is positioned after the closing body tag. (Documented bug in HTML::TreeBuilder) This is a problem for IE...

Has anyone used HTML::TreeBuilder before and found a solution to this problem?

The Doctype on the target page may very so I need to retain the original and not hardcode the doctype in to my script.

Thanks.

-Shawn

Replies are listed 'Best First'.
Re: HTML::Treebuilder - DOCTYPE
by gellyfish (Monsignor) on Jul 11, 2005 at 12:33 UTC

    See the description of the store_declarations method in the HTML::TreeBuilder documentation.

    /J\

      Thanks for the quick reply gellyfish.

      Unfortunately, it's that very method that causes the problem.

      The default option in Treebuilder $tree->store_declarations(0); is to not store the declerations. When I run the program this way, the decleration is dropped altogether. When I use $tree->store_declarations(1); the doctype is inserted after the closing body tag.

      Do you have any other suggestions?

      -Shawn

        Yes, my point was that the documentation indicates that what you are seeing is a bug and invites patches. To be honest you are probably best doing something like stripping the declaration off the returned HTML and sticking it back on the front yourself.

        /J\

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://473902]
Approved by Joost
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others making s'mores by the fire in the courtyard of the Monastery: (3)
As of 2020-05-25 02:53 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?
    If programming languages were movie genres, Perl would be:















    Results (143 votes). Check out past polls.

    Notices?