Re: HTML::TreeBuilder: sort a Definition List (<dl>)

I'm not sure I understand your question. The code you already have seems to do most of the tricky stuff, ie. getting the data out of the html.

If I were doing this (but I'm not fast enough to just whip out code right now), I think I would delete the <dt> and <dd> objects as I read them (there are a couple of methods for doing this, IIRC). Then I would sort them as HTML::Element objects, using a big Schwartzian Transform. Once you get an array of sorted HTML::Element objects, you can reattach the whole thing into the dl.

Assuming that is what you wanted to do... ;-)

Good luck.

sub jf { print substr($_[0], -1);
               jf( substr($_[0], 0, length($_[0])-1))
                    if length $_[0] > 1; }
jf('gro.alubaf@yehaf');
[download]

Comment on Re: HTML::TreeBuilder: sort a Definition List (<dl>) Select or Download Code

Replies are listed 'Best First'.
Re^2: HTML::TreeBuilder: sort a Definition List (<dl>) by Util (Priest) on Sep 13, 2005 at 01:54 UTC
++skillet-thief, I agree with your design; the code at the bottom implements it. It is slightly more complex, to handle the tags other than DT and DD that can exist in the DL. Notable issues in the OP code: `$tree->destroy` should be `$tree->delete`. You use `$tree->parse` without using `$tree->eof`! From the HTML::TreeBuilder docs: `$root->eof()` This signals that you're finished parsing content into this tree; this runs various kinds of crucial cleanup on the tree. This is called for you when you call `$root->parse_file(...)`, but not when you call `$root->parse(...)`. So if you call `$root->parse(...)`, then you must call `$root->eof()` once you've finished feeding all the chunks to `parse(...)`, and before you actually start doing anything else with the tree in `$root`. Using `new_from_content` or `new_from_file` would also prevent the problem. You say: `my ($dl) = $tree->look_down('_tag', 'dl');` [download] This means "scan everywhere in `$tree` to find all the DL tags, and put the first DL tag found into `$dl`". Why ask for them all and take the first? Instead, ask for only the first DL, by calling `look_down` in scalar context. `my $dl = $tree->look_down('_tag', 'dl');` [download] Working, tested code: #!/usr/bin/perl -W use strict; use HTML::TreeBuilder; my $tree = HTML::TreeBuilder->new_from_content(<<'END') or die; <html> <head> <title>Glossary</title> <h1>Glossary</h1> <dl> <dt><b>E Definition</b></dt> <dd>E - data</dd> <p></p> <dt><b>B Definition</b></dt> <dd>B - data</dd> <p></p> <dt><b>A_definition</b></dt> <dd>A data.</dd> <p></p> <dt><b>C definition</b></dt> <dd>C - data</dd> <p></p> </dl> </body> </html> END my $dl = $tree->look_down( _tag => 'dl' ); # Unlink all of $dl's children from $dl, and return them. my @dl_content = $dl->detach_content(); # Group the tags into an AoA on the DT tag. my @dt_tag_clusters; foreach (@dl_content) { push @dt_tag_clusters, [] if $_->tag() eq 'dt'; die "Tags occured before first DT" unless @dt_tag_clusters; push @{ $dt_tag_clusters[-1] }, $_; } # Sort the clusters @dt_tag_clusters = map { $_->[1] } sort { $a->[0] cmp $b->[0] } map { [ $_->[0]->as_HTML, $_ ] } @dt_tag_clusters; # Un-cluster the tags. @dl_content = map { @$_ } @dt_tag_clusters; # Replace the DL's content with the sorted tags. $dl->push_content( @dl_content ); print $tree->as_HTML; # or use HTML::PrettyPrinter $tree = $tree->delete(); [download]	[reply] [d/l] [select]
Re^3: HTML::TreeBuilder: sort a Definition List (<dl>) by svenXY (Deacon) on Sep 13, 2005 at 09:03 UTC
Thanks everybody! @Util - perfect! Exactly what I was looking for! I really liked the way you created the clusters. Then it took me some time to understand the map-sort-map (unitl I found it in the cookbook) and the un-clustering (well, I didn't really understand that one, but can take it as given). Not only did you solve my problem, but you also greatly enhanced my understanding of Perl and added to my toolbox of solutions to common problems! One small note though: Mapping like this: `map { [ $_->[0]->as_HTML, $_ ] }` leads to problems when you have more tags in the dt element (some are links as well), thus it's better to `map { [ $_->[0]->as_text, $_ ] }` or even to apply some more calculations on the text like lc and (at least in Germany) Umlaut considerations. More than happy, svenXY	[reply] [d/l] [select]


Think about Loose Coupling
	PerlMonks