Beefy Boxes and Bandwidth Generously Provided by pair Networks
There's more than one way to do things

Do not reinvent the wheel: real-world example using XML::Twig

by ambrus (Abbot)
on Jun 05, 2005 at 09:15 UTC ( [id://463681] : perlmeditation . print w/replies, xml ) Need Help??

This meditation is just an example to show why you shouldn't reinvent the wheel.

I wrote a script that gets its input as XML. First, I wrote a prototype in XML::Parser. Later, I reimplemented the code using XML::Twig. The output structure is not the same in the two code, as the rest of the code has changed a lot. The second code is slightly shorter and much simpler, even though also collects a bit more data too: the "value" elements.

Update. This meditation doesn't make sense this way, so let me clarify. Also, I've changed the readmore tags a bit, and added some comments as noted below. (Update: also changing the original misleading title)

XML::Parser is a low-level XML module, it's not very usable for actually working with XML, but it's good for writing other XML modules on it. All XML::Parser does is give you the tokens of an XML, and verify that it is well-formed. When I wrote the program with XML::Parser, I had to implement some higher-level XML functionality in complicated ways. This reimplementation can be seen in the first (un-readmored) part of the first code, which I have marked with comments now.

XML::Twig is a higher-level XML parser, which allows us to manipulate parts of an XML tree as a whole, not just tokens. (In fact, XML::Twig calls XML::Parser underneath, but this is not important here.) This code is short, so the second variant of my code is not that much shorter than the first one, but it is clearly simpler. It would be a hassle to use XML::Parser for a more complicated application, (Update:) or rather, for a more complicated XML file.

End of update

Here's the first code. Ignore the parts concerning %notroot, that part was moved down in the real code, as it's not really part of inputting the data.

use warnings; use strict; + use XML::Parser; my($filename, # ... and some more options ... ); # ... get options ... my(%child, %member, %goodness, %notroot); + {
# REIMPLEMENTING THE WHEEL STARTS HERE my(%handler, %starthandler, @endhandler, @string, $string); $handler{"Default"} = sub { }; $handler{"Start"} = sub { push @string, $string; $string = ""; my $h = $starthandler{$_[1]}; push @endhandler, ($h ? &$h(@_[2 .. @_ - 1]) : sub { } +); }; $handler{"End"} = sub { &{pop @endhandler}($_[1]); $string = pop @string; }; $handler{"Char"} = sub { $string .= $_[1]; }; + # REIMPLEMENTING THE WHEEL ENDS HERE (more or less)
my($child, $member, $goodness, $id); $starthandler{"cluster"} = sub { $child = []; $member = []; $goodness = (); $id = ""; sub { "" ne $id or die "cluster without id"; $child{$id} and die "dupe cluster id: $id"; $child{$id} = $child; $member{$id} = $member; $goodness{$id} = $goodness; }; }; $starthandler{"id"} = sub { sub { $id = $string }; }; $starthandler{"goodness"} = sub { sub { $goodness = $string }; }; $starthandler{"child"} = sub { sub { $notroot{$string}++; push @$child, $string; }; }; $starthandler{"member"} = sub { sub { push @$member, $string; }; };
XML::Parser->new("Handlers", \%handler)->parsefile($ARGV[0]);
warn "done parsing xml"; + } + # ... and here we actually do something with what we'we read, but I wo +n't show that ...

And here's the second code:

use warnings; use strict; + use XML::Twig; my($filename, @attrib, $verbose, # ... and some more options ... ); # ... get options ... my(%cluster, %element, %attrib); { my $EMPTY = []; my %handler; for my $a (@attrib) { $attrib{$a} = 1; } $handler{"element"} = sub { my($t, $e) = @_; my $id = $e->first_child_trimmed_text("id"); length($id) or die "invalid input: element with no id"; exists($element{$id}) and die qq[invalid input: duplicate element id "$i +d"]; my(%new, $a, $ae); for $ae ($e->children("value")) { $attrib{$a = $ae->att("id")} and $new{$a} = $ae->text; } $element{$id} = \%new; $t->purge; 1; }; $handler{"cluster"} = sub { my($t, $e) = @_; my $id = $e->first_child_trimmed_text("id"); my %c; length($id) or die "invalid input: cluster without id"; exists($cluster{$id}) and die qq[invalid input: duplicate cluster id "$i +d"]; my $g = $e->first_child_trimmed_text("goodness"); length($g) and $c{"goodness"} = 0 + $g; my @m = map { $_->trimmed_text } $e->children("member" +); $c{"members"} = @m ? \@m : $EMPTY; my @c = map { $_->trimmed_text } $e->children("child") +; $c{"children"} = @c ? \@c : $EMPTY; $cluster{$id} = \%c; $t->purge; 1; }; my $twig = XML::Twig->new("twig_handlers", \%handler); $verbose and warn "starting to parse xml file"; $twig->parsefile($filename); $verbose and warn "finished parsing xml file"; my $root = $twig->root; + } + # ... and I omit the rest of the code again ...