Beefy Boxes and Bandwidth Generously Provided by pair Networks
"be consistent"
 
PerlMonks  

Re^5: Is there any XML reader like this? (XML::Simple beats LibXML hands down in the speed stakes!)

by ikegami (Pope)
on Jan 16, 2012 at 07:58 UTC ( #948085=note: print w/ replies, xml ) Need Help??


in reply to Re^4: Is there any XML reader like this? (XML::Simple beats LibXML hands down in the speed stakes!)
in thread Is there any XML reader like this?

It is easy to see which one wins in the speed stakes.

Yeah, LibXML. My tests *included* the time it took to extract the data from the tree. The test was done with real world data of various size from three different providers.

We use XML::Bare with a thin layer to compensate for it's awful interface (XML::Simple without ForceArray or any other option), its expectation of getting decoded text, and it's lack of namespace support. It's slightly faster when you factor in the time it takes to extract data. Not nearly as capable as libxml, and we had to create an interface just to be able to use it.


Comment on Re^5: Is there any XML reader like this? (XML::Simple beats LibXML hands down in the speed stakes!)
Re^6: Is there any XML reader like this? (XML::Simple beats LibXML hands down in the speed stakes!)
by BrowserUk (Pope) on Jan 16, 2012 at 09:14 UTC
    Yeah, LibXML. My tests *included* the time it took to extract the data from the tree.

    Hm. So did mine. But I believe mine.

    We use XML::Bare with a thin layer to compensate for it's awful interface (XML::Simple without ForceArray or any other option)

    Hm. XML::Bare::forcearray( [noderef] )

    S'funny init. It took less than a minute to disprove that. And after 5 minutes, I'm pretty sure I could use XML::Bare to read a file and get access to its content.

    Conversely, when I tried to look up getDocumentElement, I completely crapped out after about an hour. You applied it to the return from load_xml() which is labelled $dom. So look in DOM. Nada. Maybe a Node. Nada. How about a parser, or a nodelist or a namespace? Nada, nada, nada!

    Your idea of an "awful interface" is weird.

    For me:

    • the best interface is the one I don't have to lookup more than once.

      That means small.

    • The second best interface is one that makes it easy to lookup what I need to know.

      That means the first page shows me enough to get something working.

      Details, refinements and esoterica can be deferred to secondary pages if that cannot be avoided.

    • The third best interface is one that if it has to be large, is logically grouped.

      That means, it starts by splitting the documentation along vertical lines. Ie. The way people need to use the interface. Eg, Read an XML; or write an XML; or edit an XML. etc. Not horizontally according to some arbitrary way the author decided to structure his code.

      And it means starting with the basics in the root document, in the form of simple -- but complete -- worked examples of the main modes of use. And leaving the esoteric details for (preferably linked (and links that actually work)) secondary pages.

      Not hitting the user in the face with a top level synopsis that contain every possible variation of the constructor and no indication of where to go from there.

    XML::LibXML fails on every count.

    Can we stop now, because we are once again doing nothing to help the OP; nor each other.


    With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.

    The start of some sanity?

      I really don't see how the documentation for XML::LibXML could be especially difficult to follow for anyone who is vaguely familiar with Perl OO programming. Just follow the usual rule:

      If $obj->isa($class) then consult perldoc $class.

      The documentation for the parser says that load_xml "the return value [...] is a XML::LibXML::Document object". So you turn to the XML::LibXML::Document documentation.

      The method is called documentElement. It's shown in the SYNOPSIS for XML::LibXML::Document, and documented further down in the METHODS section.

      getDocumentElement is just an alias for documentElement so is documented much less prominently, so I can understand how that could have been harder to find, but most clients that you'd view documentation in (e.g. browser, "man", "perldoc") allow you to search for strings quite easily.

      But anyway, some of your statements on XML::LibXML reveal what I think is a fundamental difference between what you want to do with XML, and what XML::LibXML is designed for.

      You just want to get data out of XML and handle it as some sort of native data structure. XML::LibXML is for people who want to keep their data in as XMLish a form as possible (short of loading it into memory as a single XML formatted string and manipulating it with regexps!) - for people who care (not just at loading and saving time) about the difference between:

      <html> <head><title>Foo</title></head> </html>

      and:

      <html> <head title="Foo" /> </html>

      For people who, given a node $x in some deeply nested data structure, sometimes need to do $x->parentNode.

      If you don't need to do that sort of stuff, then perhaps there's a mismatch between your needs and XML::LibXML's aims.

      I can tell you that XML::Simple would have been quite useless for something like XML::Atom::OWL.

        If you don't need to do that sort of stuff, then perhaps there's a mismatch between your needs and XML::LibXML's aims.

        Finally, we agree on something.

        Most, not all, but most, of my XML needs are for gaining read-only access to relatively small volumes of simple data. Unencoded, with no namespaces or CDATA with no need to look at comments. Mostly configuration data, RDF feeds and similar. Things for which XML::Simple is ideal and XML::LibXML is simply overkill.

        Ie. Exactly the sort if simple XML shown by the OP above. The same data for which you suggested he abandon the simple, easy to use, clearly documented tool he was using in favour of an the over-engineered, horribly documented, behemoth of a chainsaw that is XML::LibXML.

        Just as I don't use a chainsaw to trim my toenails, I use the XML tool that suits my purpose.


        With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
        Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
        "Science is about questioning the status quo. Questioning authority".
        In the absence of evidence, opinion is indistinguishable from prejudice.

        The start of some sanity?

      Your idea of an "awful interface" is weird.

      Usable, readable. If I have to use a gazillion defined and ref checks to extract one value, it's not usable, given that one function call is all that's needed.

      I've used XML::Simple and XML::LibXML, yet only the former gives me trouble.

      Can we stop now, because we are once again doing nothing to help the OP; nor each other.

      Then stop using the word "you" and stick to the subject.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://948085]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others exploiting the Monastery: (9)
As of 2014-07-24 00:02 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    My favorite superfluous repetitious redundant duplicative phrase is:









    Results (155 votes), past polls