http://www.perlmonks.org?node_id=473902

smaida has asked for the wisdom of the Perl Monks concerning the following question:

Hello monks,

I am working on a script that parses content from web pages and inserts the data in to a new page. The process is as follows:

The problem is that I'm using HTML::TreeBuilder for both parsing the source page and inserting data in to the target page and when creating the target page the DOCTYPE is positioned after the closing body tag. (Documented bug in HTML::TreeBuilder) This is a problem for IE...

Has anyone used HTML::TreeBuilder before and found a solution to this problem?

The Doctype on the target page may very so I need to retain the original and not hardcode the doctype in to my script.

Thanks.

-Shawn

Replies are listed 'Best First'.
Re: HTML::Treebuilder - DOCTYPE
by gellyfish (Monsignor) on Jul 11, 2005 at 12:33 UTC

    See the description of the store_declarations method in the HTML::TreeBuilder documentation.

    /J\

      Thanks for the quick reply gellyfish.

      Unfortunately, it's that very method that causes the problem.

      The default option in Treebuilder $tree->store_declarations(0); is to not store the declerations. When I run the program this way, the decleration is dropped altogether. When I use $tree->store_declarations(1); the doctype is inserted after the closing body tag.

      Do you have any other suggestions?

      -Shawn

        Yes, my point was that the documentation indicates that what you are seeing is a bug and invites patches. To be honest you are probably best doing something like stripping the declaration off the returned HTML and sticking it back on the front yourself.

        /J\