Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl Monk, Perl Meditation
 
PerlMonks  

Comment on

( #3333=superdoc: print w/ replies, xml ) Need Help??

This meditation is just an example to show why you shouldn't reinvent the wheel.

I wrote a script that gets its input as XML. First, I wrote a prototype in XML::Parser. Later, I reimplemented the code using XML::Twig. The output structure is not the same in the two code, as the rest of the code has changed a lot. The second code is slightly shorter and much simpler, even though also collects a bit more data too: the "value" elements.

Update. This meditation doesn't make sense this way, so let me clarify. Also, I've changed the readmore tags a bit, and added some comments as noted below. (Update: also changing the original misleading title)

XML::Parser is a low-level XML module, it's not very usable for actually working with XML, but it's good for writing other XML modules on it. All XML::Parser does is give you the tokens of an XML, and verify that it is well-formed. When I wrote the program with XML::Parser, I had to implement some higher-level XML functionality in complicated ways. This reimplementation can be seen in the first (un-readmored) part of the first code, which I have marked with comments now.

XML::Twig is a higher-level XML parser, which allows us to manipulate parts of an XML tree as a whole, not just tokens. (In fact, XML::Twig calls XML::Parser underneath, but this is not important here.) This code is short, so the second variant of my code is not that much shorter than the first one, but it is clearly simpler. It would be a hassle to use XML::Parser for a more complicated application, (Update:) or rather, for a more complicated XML file.

End of update

Here's the first code. Ignore the parts concerning %notroot, that part was moved down in the real code, as it's not really part of inputting the data.

use warnings; use strict; + use XML::Parser; my($filename, # ... and some more options ... ); # ... get options ... my(%child, %member, %goodness, %notroot); + { +
# REIMPLEMENTING THE WHEEL STARTS HERE my(%handler, %starthandler, @endhandler, @string, $string); $handler{"Default"} = sub { }; $handler{"Start"} = sub { push @string, $string; $string = ""; my $h = $starthandler{$_[1]}; push @endhandler, ($h ? &$h(@_[2 .. @_ - 1]) : sub { } +); }; $handler{"End"} = sub { &{pop @endhandler}($_[1]); $string = pop @string; }; $handler{"Char"} = sub { $string .= $_[1]; }; + # REIMPLEMENTING THE WHEEL ENDS HERE (more or less)
my($child, $member, $goodness, $id); $starthandler{"cluster"} = sub { $child = []; $member = []; $goodness = (); $id = ""; sub { "" ne $id or die "cluster without id"; $child{$id} and die "dupe cluster id: $id"; $child{$id} = $child; $member{$id} = $member; $goodness{$id} = $goodness; }; }; $starthandler{"id"} = sub { sub { $id = $string }; }; $starthandler{"goodness"} = sub { sub { $goodness = $string }; }; $starthandler{"child"} = sub { sub { $notroot{$string}++; push @$child, $string; }; }; $starthandler{"member"} = sub { sub { push @$member, $string; }; }; +
XML::Parser->new("Handlers", \%handler)->parsefile($ARGV[0]);
warn "done parsing xml"; + } + # ... and here we actually do something with what we'we read, but I wo +n't show that ...

And here's the second code:

use warnings; use strict; + use XML::Twig; my($filename, @attrib, $verbose, # ... and some more options ... ); # ... get options ... my(%cluster, %element, %attrib); { my $EMPTY = []; my %handler; for my $a (@attrib) { $attrib{$a} = 1; } $handler{"element"} = sub { my($t, $e) = @_; my $id = $e->first_child_trimmed_text("id"); length($id) or die "invalid input: element with no id"; exists($element{$id}) and die qq[invalid input: duplicate element id "$i +d"]; my(%new, $a, $ae); for $ae ($e->children("value")) { $attrib{$a = $ae->att("id")} and $new{$a} = $ae->text; } $element{$id} = \%new; $t->purge; 1; }; $handler{"cluster"} = sub { my($t, $e) = @_; my $id = $e->first_child_trimmed_text("id"); my %c; length($id) or die "invalid input: cluster without id"; exists($cluster{$id}) and die qq[invalid input: duplicate cluster id "$i +d"]; my $g = $e->first_child_trimmed_text("goodness"); length($g) and $c{"goodness"} = 0 + $g; my @m = map { $_->trimmed_text } $e->children("member" +); $c{"members"} = @m ? \@m : $EMPTY; my @c = map { $_->trimmed_text } $e->children("child") +; $c{"children"} = @c ? \@c : $EMPTY; $cluster{$id} = \%c; $t->purge; 1; }; my $twig = XML::Twig->new("twig_handlers", \%handler); $verbose and warn "starting to parse xml file"; $twig->parsefile($filename); $verbose and warn "finished parsing xml file"; my $root = $twig->root; + } + # ... and I omit the rest of the code again ...

In reply to Do not reinvent the wheel: real-world example using XML::Twig by ambrus

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post; it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • Outside of code tags, you may need to use entities for some characters:
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.
  • Log In?
    Username:
    Password:

    What's my password?
    Create A New User
    Chatterbox?
    and the web crawler heard nothing...

    How do I use this? | Other CB clients
    Other Users?
    Others studying the Monastery: (5)
    As of 2014-10-02 06:03 GMT
    Sections?
    Information?
    Find Nodes?
    Leftovers?
      Voting Booth?

      What is your favourite meta-syntactic variable name?














      Results (49 votes), past polls