++skillet-thief, I agree with your design; the code at the bottom implements it. It is slightly more complex, to handle the tags other than DT and DD that can exist in the DL.
Notable issues in the OP code:
-
$tree->destroy should be $tree->delete.
- You use $tree->parse without using $tree->eof! From the HTML::TreeBuilder docs:
$root->eof()
This signals that you're finished parsing content into this tree; this runs various kinds of crucial cleanup on the tree. This is called for you when you call $root->parse_file(...), but not when you call $root->parse(...). So if you call $root->parse(...), then you must call $root->eof() once you've finished feeding all the chunks to parse(...), and before you actually start doing anything else with the tree in $root.
Using new_from_content or new_from_file would also prevent the problem.
-
You say:
my ($dl) = $tree->look_down('_tag', 'dl');
This means "scan *everywhere* in $tree to find all the DL tags, and put the first DL tag found into $dl". Why ask for them all and take the first? Instead, ask for *only* the first DL, by calling look_down in scalar context.
my $dl = $tree->look_down('_tag', 'dl');
Working, tested code:
#!/usr/bin/perl -W
use strict;
use HTML::TreeBuilder;
my $tree = HTML::TreeBuilder->new_from_content(<<'END') or die;
<html>
<head>
<title>Glossary</title>
<h1>Glossary</h1>
<dl>
<dt><b>E Definition</b></dt>
<dd>E - data</dd>
<p></p>
<dt><b>B Definition</b></dt>
<dd>B - data</dd>
<p></p>
<dt><b>A_definition</b></dt>
<dd>A data.</dd>
<p></p>
<dt><b>C definition</b></dt>
<dd>C - data</dd>
<p></p>
</dl>
</body>
</html>
END
my $dl = $tree->look_down( _tag => 'dl' );
# Unlink all of $dl's children from $dl, and return them.
my @dl_content = $dl->detach_content();
# Group the tags into an AoA on the DT tag.
my @dt_tag_clusters;
foreach (@dl_content) {
push @dt_tag_clusters, [] if $_->tag() eq 'dt';
die "Tags occured before first DT" unless @dt_tag_clusters;
push @{ $dt_tag_clusters[-1] }, $_;
}
# Sort the clusters
@dt_tag_clusters = map { $_->[1] }
sort { $a->[0] cmp $b->[0] }
map { [ $_->[0]->as_HTML, $_ ] }
@dt_tag_clusters;
# Un-cluster the tags.
@dl_content = map { @$_ } @dt_tag_clusters;
# Replace the DL's content with the sorted tags.
$dl->push_content( @dl_content );
print $tree->as_HTML; # or use HTML::PrettyPrinter
$tree = $tree->delete();
| [reply] [d/l] [select] |
Thanks everybody!
@Util - perfect! Exactly what I was looking for!
I really liked the way you created the clusters. Then it took me some time to understand the map-sort-map (unitl I found it in the cookbook) and the un-clustering (well, I didn't really understand that one, but can take it as given).
Not only did you solve my problem, but you also greatly enhanced my understanding of Perl and added to my toolbox of solutions to common problems!
One small note though:
Mapping like this:
map { [ $_->[0]->as_HTML, $_ ] }
leads to problems when you have more tags in the dt element (some are links as well), thus it's better to
map { [ $_->[0]->as_text, $_ ] } or even to apply some more calculations on the text like lc and (at least in Germany) Umlaut considerations.
More than happy,
svenXY
| [reply] [d/l] [select] |