Beefy Boxes and Bandwidth Generously Provided by pair Networks
good chemistry is complicated,
and a little bit messy -LW
 
PerlMonks  

XML::LibXML - WHAR HASH TREES WHAR?!

by SineSwiper (Beadle)
on Jul 15, 2011 at 04:05 UTC ( [id://914502]=perlquestion: print w/replies, xml ) Need Help??

SineSwiper has asked for the wisdom of the Perl Monks concerning the following question:

For a project or two, I've been using XML::Smart for my XML needs. However, the module was last updated 7 years ago, and it's looked to have stopped developing.  It's so old that it doesn't seem to even mention XSD support, something I was looking to adapt for this new project.  There's also an annoying DESTROY bug that seems to pollute die() calls.

Since most agree that XML::LibXML is the way to go for a powerful XML interface (and supports XSD), I went that route and looked into converting what I had for XML::Smart code into XML::LibXML equivalents.  What I found was absolutely appalling, especially compared to XML::Smart:

my $sXML = XML::Smart->new('file.xml', 'SMART'); my $pXML = XML::LibXML->load_xml('file.xml'); ### Basic Transversing ### # Smart = hash trees $sXML = $sXML->cut_root; my $book = $sXML->{book}[3]; print $book->{title}; # LibXML = no hash trees $pXML = $pXML->documentElement(); my $book = $pXML->childNodes()->[3]; print $book->findvalue('title'); # (though, XPaths makes it a little bit better...) print $pXML->findvalue('book[4]/title'); ############################## ### Array of values ### # Smart = simple and elegant $sXML = $sXML->cut_root; my @books = $sXML->{book}('@'); # LibXML = ugly, ugly, ugly $pXML = $pXML->documentElement(); my @books = map { $_->textContent } $pXML->find('book')->get_nodelist( +); ############################## ### Creating a new XML doc ### # Smart = light on the code, and doesn't require searching for 500 met +hods my $sXML = XML::Smart->new(); $sXML->{root}->{'xmlns:xsi'} = "http://www.w3.org/2001/XMLSchema-insta +nce"; $sXML->{root}->{'xsi:schemaLocation'} = "test.xsd"; $sXML->{root}->{'data'} = "Hello World!"; $sXML->{root}->{'data'}->set_node(); # LibXML = WTF?!? my $pXML = XML::LibXML::Document->new('1.0', 'UTF-8'); my $root = XML::LibXML::Element->new('root'); $root->setNamespace("http://www.w3.org/2001/XMLSchema-instance", 'xsi' +, 0); $root->setAttributeNS("http://www.w3.org/2001/XMLSchema-instance", 'xs +i:schemaLocation', "test.xsd"); $pXML->setDocumentElement($root); $root->addNewChild(undef, 'data')->setData("Hello World!"); ############################## ### Adding a new row of elements ### # Smart = blends in with Perl hashes my $node = ( 'title' => 'XML Schema', 'author' => 'Eric Van der Vlist', 'publisher' => "O'Reilly Media Inc.", 'phystype' => 'Paperback', 'year' => 2002, ); push(@{$sXML->{book}}, $node); # (though, this is a little ugly...) # LibXML = Bats**t crazy! my $node = XML::LibXML::Element->new('book'); $node->addNewChild(undef, 'title') ->setData('XML Schema'); $node->addNewChild(undef, 'author') ->setData('Eric Van der Vlist'); $node->addNewChild(undef, 'publisher')->setData("O'Reilly Media Inc.") +; $node->addNewChild(undef, 'phystype') ->setData('Paperback'); $node->addNewChild(undef, 'year') ->setData(2002); $sXML->appendChild($node); ### if exists OR define ### # Smart = a simple one-liner $row->{author} ||= "Freddy Krueger"; # LibXML = crappy $row->findvalue('author') || $row->addNewChild(undef, 'author')->setDa +ta("Freddy Krueger"); ############################## ### Searching ### # Smart = meh... my $book = $sXML->{book}('title', 'eq', 'XML Schema'); # LibXML = better, but... my $book = $pXML->find('book[title="XML Schema"]'); # ...this would be really cool my $book = $sXML->{book}('[title="XML Schema"]'); my $book = $sXML->('book[title="XML Schema"]'); ##############################

Plus, the documentation for XML::LibXML is horrid!  The method reference is spread all over creation, there's too many methods, and there's no XML::LibXML::Tutorial that would actually tell you exactly what you should be doing.

Isn't there's a module to help link these together?  Give me both hash trees and the methods from LibXML.  Allow me to do something like $XML->{book}('$%').  Provide me with shorter sub names than "getElementsByLocalName".  (For god's sake, jQuery was created because people were sick of typing "document.getElementById" all the damn time.)  Make it so I don't get error messages like "Undefined subroutine &XML::LibXML::NodeList::value called at xml_parse.pl line 263".  (And this was the OTHER reason why jQuery was invented...)

Where is XML::LibXML::Smart || XML::LibXML::Tree || XML::LibXML::jQuery?

Replies are listed 'Best First'.
Re: XML::LibXML - WHAR HASH TREES WHAR?!
by Anonymous Monk on Jul 15, 2011 at 08:14 UTC

    What I found was absolutely appalling, especially compared to XML::Smart

    Yes, gmpassos is M.I.A., this means you can take over maintenance/development of XML::Smart -- you can also fork the codebase at any time

    Plus, the documentation for XML::LibXML is horrid!

    Actually, no, it really isn't. It is quite comprehensive and complete; in any case, as they say, patches welcome

    (For god's sake, jQuery was created because people were sick of typing "document.getElementById" all the damn time.)

    Actually, no it wasn't. jQuery was created because html/browsers did not expose xpath to javascript. XML::LibXML gives you xpath.

    Any idiot can define a one char shortcut for document.getElementById

    Where is XML::LibXML::Smart || XML::LibXML::Tree || XML::LibXML::jQuery?

    Eh?

      Yes, gmpassos is M.I.A., this means you can take over maintenance/development of XML::Smart -- you can also fork the codebase at any time.

      That's not my point, and something I don't have the time for right now. XML::Smart is mostly dead and XML::LibXML is now the currently maintained XML module. My point is that with all of the development and talk about how great this module is, why hasn't there been some efforts to make it more elegant and "sugary"?

      Actually, no, it really isn't. It is quite comprehensive and complete; in any case, as they say, patches welcome

      A typical RTFM-like response, hmm?

      And again, the docs are spread all over creation. Quick, tell me how to create a XML tree of nodes. Well, that's going to diving into 4-5 different module docs to look up the appendChild, setText, etc. methods. Not to mention that there are about 10 different ways to create these things and there is no tutorial saying "Okay, this is the RIGHT way to do it that uses the least code."

      For that manner, can anybody tell me how XML::LibXML::SAX works? Because the documentation there doesn't give me any methods to use. Even the other sections, like this one, literally say "Usage is as above", which means "I'm too lazy to actually list out the methods for this module." Meanwhile, XML::LibXML::SAX::Parser, the class that isn't "DEPRACED" has absolutely no documentation.

      Actually, no it wasn't. jQuery was created because html/browsers did not expose xpath to javascript. XML::LibXML gives you xpath.

      And did they implement XPath? No. They used the CSS selector language to grab nodes. Why? Because everybody was familiar with it and we don't need Yet Another Selector Language. Just stick with one and be done with it.

      Any idiot can define a one char shortcut for document.getElementById

      Yes, and when millions of "idiots" are defining one-char shortcuts for a function, there is something wrong with the function.

      Eh?

      Okay, that's just a list of random modules.

        That's not my point, and something I don't have the time for right now. XML::Smart is mostly dead and XML::LibXML is now the currently maintained XML module.

        If its not your point, why dwell on it? If you find it useful and convenient, why not keep it updated? Software only dies if you stop using it.

        My point is that with all of the development and talk about how great this module is, why hasn't there been some efforts to make it more elegant and "sugary"?

        Hmm, why does an interface to a standards based library not come with sugar?

        Perhaps, because diabetes is an epidemic? I don't know -- seems like too much work for one man

        A typical RTFM-like response, hmm?

        Is it? I found it quite civil in comparison to your post. I feel the same as Re: XML::LibXML - WHAR HASH TREES WHAR?!

        And again, the docs are spread all over creation.

        Why is that a problem? Perl documentation doesn't come all on one page either -- you could always make one page with all the docs :)

        Quick, tell me how to create a XML tree of nodes. Well, that's going to diving into 4-5 different module docs to look up the appendChild, setText, etc. methods.

        http://cpansearch.perl.org/src/SHLOMIF/XML-LibXML-1.80/t/04node.t

        Not to mention that there are about 10 different ways to create these things and there is no tutorial saying "Okay, this is the RIGHT way to do it that uses the least code."

        Interfacing to libxml is a big enough job -- the docs aren't there to teach you programming, oo concepts, patterns, paradigms (like event-based programming ), xml ...

        For that manner, can anybody tell me how XML::LibXML::SAX works? Because the documentation there doesn't give me any methods to use. Even the other sections, like this one, literally say "Usage is as above", which means "I'm too lazy to actually list out the methods for this module." Meanwhile, XML::LibXML::SAX::Parser, the class that isn't "DEPRACED" has absolutely no documentation.

        The docs say

        XML::LibXML provides an interface to libxml2 direct SAX interface

        The SAX interface of XML::LibXML is based on the famous XML::SAX interface. It uses the generic interface as provided by XML::SAX::Base.

        So XML::SAX, XML::SAX::Base, http://cpansearch.perl.org/src/SHLOMIF/XML-LibXML-1.80/t/14sax.t, SAX is the Simple API for XML, SAX

        Yes, and when millions of "idiots" are defining one-char shortcuts for a function, there is something wrong with the function.

        No there isn't. You still use print don't you? or say? You're not using p or s are you? Or in javascript terms, you're still using $('a[href]').each( aren't you? You're not using $('a[href]').e(?

        And did they implement XPath? No. They used the CSS selector language to grab nodes.

        $('a[href$="pdf"]') -- looks like xpath to me -- sure, its called "css2", but its xpath -- in anycase, xpath is the standard for xml

        Okay, why did nobody mention XML::Compile? (Well, besides stuffing it in a list above.) It uses XML::LibXML directly. It's a mature and well-supported set of modules. It creates hash trees, and it uses those to create XML from schema files.

        Why isn't, like, 100% of the Perl community using this?

Re: XML::LibXML - WHAR HASH TREES WHAR?!
by ikegami (Patriarch) on Jul 15, 2011 at 17:31 UTC

    Ignoring your attempts at using an XML parser as an XML generator, one is left with:

    # XML::Smart print $sXML->{book}[3]{title}; # XML::LibXML print $root->find('book[4]/title');
    # XML::Smart my @books = $sXML->{book}('@'); # XML::LibXML my @books = $root->findnodes('book');

    So XML::LibXML is a little bit wordier, but there's a good reason for that. XML::Smart chose to sacrifice some critical functionality to offer its interface.

    • XML::LibXML supports namespaces. XML::Smart doesn't (except when used to generate documents).

    • XML::LibXML preserves the order of the children of elements. XML::Smart can't even list the children of an element.

    • Comments are dropped.

    • Only can only distinguish distinguish between elements and attributes when using XML::LibXML.

    Either of the first three would make XML::Smart incapable of dealing with the most commonly used XML format (XHTML). The second means it's incapable of handling just about every XML format out there.

    Update: Simplified an XML::Smart example that wasn't optimal.
    Update: Added third and fourth bullet.

      Ignoring your attempts at using an XML parser as an XML generator

      Who said LibXML is just a parser?

      XML::LibXML supports namespaces. XML::Smart doesn't (except when used to generate documents).

      Yes, probably because it's too old for it. It doesn't support XSD, either, which is part of the reason I'm switching to LibXML.

      I'm not debating that Smart is missing critical functionality. I'm frustrated that LibXML is this mature, and is a Perl module, and yet is this clunky at navigation. Nor are there any modules for XML::LibXML that enhance it to fix its navigation.

      I mean seriously, this is what I have to do to check for a tag and fill data for it:

      # Before $sn->{'SourceDBType'} ||= 'XML'; # After $sn->findvalue('SourceDBType') || ($sn->find('SourceDBType') ? $sn->fi +nd('SourceDBType')->setData('XML') : $sn->addNewChild(undef, 'SourceD +BType')->setData('XML'));

      Are you kidding me?!?!  That's a horrible mess!  And god forbid I do something like $sn->find('SourceDBType')->setData('XML') without checking the existence of SourceDBType first.  That'll give me a fatal error.

      XML::LibXML preserves the order of the children of elements. XML::Smart can't even list the children of an element.

      Incorrect. Smart does both: $XML->{server}{address}[1]('@');  # there's your children of that <address> tag

        Who said LibXML is just a parser?

        You, for one.

        Yes, probably because it's too old for it.

        I meant it can't support namespaces.

        I'm not debating that Smart is missing critical functionality.

        Neither was I. I was referring to what its design allows.

        Are you kidding me?!?! That's a horrible mess!

        Are you kidding me? The equivalent would be

        $sb->setAttribute(SourceDBType => 'XML') if !$sb->setAttribute('XML');

        Incorrect. Smart does both:

        Sorry, but that's not working.

        use strict; use warnings; use feature qw( say ); use XML::Smart qw( ); my $doc = XML::Smart->new(<<'__EOI__'); <root> <node> a <foo/> b <bar/> c </node> </root> __EOI__ my @children = $doc->{root}{node}[0]('@'); say @children == 5 ? "ok" : "XXX";
        1 XXX
Re: XML::LibXML - WHAR HASH TREES WHAR?!
by hardburn (Abbot) on Jul 15, 2011 at 15:31 UTC

    DOM may be terrible in some ways, but it's not LibXML's fault for following it. People expect that nearly any widely used language out there will have a DOM parser. That way, we can all suffer equally.

    I suspect what you want is XML::Simple.


    "There is no shame in being self-taught, only in not trying to learn in the first place." -- Atrus, Myst: The Book of D'ni.

      All I want is the most powerful and best XML parser available. XML::LibXML is fast, linked to a very common lib, and regularly updated. However, it's bedside manner has something to be desired. XML::Simple has been regarded as the easier of the two, but not the best.

      Why does both have to suck in opposite but equal manners? What's wrong with having both the power of the methods and tying it to a hash tree? You can put a hash tree on top of the methods. What's wrong with something like this?

      $pXML->{bookstore}{book}[3]->appendText(", The"); my @books = $pXML->('//book'); $node->('/bookstore/book[price>35]/price')->setData(24.95);

        XML::Simple has been regarded as the easier of the two.

        XML::Simple is the hardest XML parser to use.

        $smart->{book}('@')
        is
        ref($simple->{book}) && ref($simple->{book}) eq 'ARRAY' ? $simple->{book} : [ $simple->{book} ]
        or
        ForceArray => [qw( book )] # In constructor $simple->{book}

        XML::Smart appears to be an attempt to fix this problem of XML::Simple. (It still suffers from other problems of XML::Simple.)

        LibXML is doing things that way for perfectly defensible reasons. It's a thin wrapper around a C library.

        If you'd like to simplify things on top of that, then by all means, start a project to fix it. Complaining about it on in SoPW, though, is unproductive.


        "There is no shame in being self-taught, only in not trying to learn in the first place." -- Atrus, Myst: The Book of D'ni.

Re: XML::LibXML - WHAR HASH TREES WHAR?!
by Anonymous Monk on Jul 15, 2011 at 05:34 UTC
    I found this post absolutely appalling. If you want help, ask for it. If you want to file a legitimate bug or feature request for the module, the documentation mentions where to go. This is especially relevant since the module has recently come under active maintenance after a long while.

    If you just want to vent steam, use the Chatterbox or IRC.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://914502]
Approved by kcott
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others making s'mores by the fire in the courtyard of the Monastery: (3)
As of 2024-04-25 06:48 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found