Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl Monk, Perl Meditation
 
PerlMonks  

HTML::TreeBuilder dropping end tags

by admiral_grinder (Pilgrim)
on Jul 13, 2005 at 17:41 UTC ( [id://474632]=perlquestion: print w/replies, xml ) Need Help??

admiral_grinder has asked for the wisdom of the Perl Monks concerning the following question:

I'm having a issue with HTML::Treebuilder. After my data hits the XSLT transformer, I have to do some post-processing on the HTML code to finish out some project requirements. In this case I have 2 functions that use HTML::Treebuilder in series (I don't think it would be a good idea yet to combine them).

The first TreeBuilder pass works great. The second pass through the Treebuilder causes not all, but some of the ending html tags to loose the ending tags. Notably 'li' and 'p' tags.

For the second pass, I did a cut and paste of the tree setup and output/teardown code from the first pass. I have tried using the second pass just to pull in the HTML and then output it right back out without changing it, yet it drops the ending tags anyways.

I am wondering if this is a known issue with HTML::TreeBuilder? Any work arounds or other advise?

-- Brian


*** Edit ***
Sorry about this, but it appears that I had my passes mixed up. Either way, jdporter gave me the advice I needed to fix it (thanks man).

My first pass was messing up the markup by not putting in the ending tags and when the second tree got ahold of it (I gave it options to preserve the data as it gets it since it was a secion of a HTML page). It built a goofy tree whereas the other pass needed a good tree to do a page order tree walk.

Thanks for the quick turnaround on this. It would have taken forever on google groups.

Replies are listed 'Best First'.
Re: HTML::TreeBuilder dropping end tags
by jdporter (Paladin) on Jul 13, 2005 at 17:48 UTC
    Are you talking about the html generated by the as_HTML method of your TreeBuilder object? If so, then read about as_HTML in the HTML::Element doc. There are some optional args, and the one that affects you is the third. You'll probably want to call it as
    $tree->as_HTML( undef, ' ', {} );
      Thanks for pointing that out to me. I updated the original post with a section explaining why didn't work right.
Re: HTML::TreeBuilder dropping end tags
by halley (Prior) on Jul 13, 2005 at 18:23 UTC
    The <li> and <p> tags are historically and traditionally special, in that they represent the beginning of a span but do not require any special closing tag. The next tag that closes the list or starts another item will be the implied end of the span.

    I think they're currently specified by W3C to require them, since I'd expect things would trend toward strict SGML/XML compliance. However, any browser worth anything will have to put up with this historical usage. Be flexible in what you accept, and strict in what you produce, as the saying goes.

    (Oh, and lose the extra "o" in loose. If your belt is loose, you might lose your pants.)

    --
    [ e d @ h a l l e y . c c ]

      in this case I am using XHTML (generated from some XML data), and setting options to preserve the XHTML data as much as possible. I suck at spelling too, the backspace key gets a really healthy workout.
Re: HTML::TreeBuilder dropping end tags
by gellyfish (Monsignor) on Jul 13, 2005 at 19:56 UTC

    It occurs to me, despite you already having found the solution to your problem, that if you are producing XHTML then you might be better off treating as XML and using something like XML::Twig for instance, you might find this gives yoy a greater deal of flexibility.

    /J\

Re: HTML::TreeBuilder dropping end tags
by jptxs (Curate) on Jul 13, 2005 at 18:04 UTC

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://474632]
Approved by jdporter
Front-paged by jdporter
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others wandering the Monastery: (5)
As of 2024-04-19 12:14 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found