Beefy Boxes and Bandwidth Generously Provided by pair Networks
We don't bite newbies here... much
 
PerlMonks  

Re^3: Is there any XML reader like this?

by ikegami (Pope)
on Jan 13, 2012 at 23:24 UTC ( #947849=note: print w/ replies, xml ) Need Help??


in reply to Re^2: Is there any XML reader like this?
in thread Is there any XML reader like this?

I'm have no idea why you call XML::LibXML a monster compared to XML::Simple.

use XML::Simple qw( :strict XMLin ); local $XML::Simple::PREFERRED_PARSER = 'XML::Parser'; my $stations = XMLin( \*DATA, ForceArray => 1, KeyAttr => [] ); for my $station_name (keys %$stations) { say $station_name; my $station = $stations->{$station_name}[0]; for my $ip (@{ $station->{ips} // [] }) { say " $ip"; } }
use XML::LibXML qw( ); my $root = XML::LibXML->load_xml( IO => \*DATA )->documentElement; for my $station ($root->findnodes('*')) { say $station->getName; for my $ip ($station->findnodes('ip')) { say " ".$ip->textContent; } }

And that's not even mentioning the fact that XML::LibXML is 20x faster* and able to handle so much more stuff than XML::Simple (including every day stuff).

* — That assumes XML::Parser is used as XML::Simple's backend. XML::LibXML is 10,000x faster than XML::Simple's common default of XML::SAX::PurePerl (which handles encodings really badly).

Update: Fixed an error in XML::Simple code.
Update: Fixed an error in XML::LibXML code. ("IO" was mispelled, and the XPath was wrong.)


Comment on Re^3: Is there any XML reader like this?
Select or Download Code
Re^4: Is there any XML reader like this?
by BrowserUk (Pope) on Jan 13, 2012 at 23:59 UTC
    I'm have no idea why you call XML::LibXML a monster compared to XML::Simple.

    Here's one reason:

    XML::LibXML->load: specify location, string, or IO at C:\test\xml1.pl +line 7

    This is line 7:

    my $root = XML::LibXML->load_xml( fh => \*DATA )->documentElement;

    So now you've got to wade through the 32 separate pages of XML::LibXML POD to work out why!

    I never have that problem with XML::Simple.


    With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.

    The start of some sanity?

Re^4: Is there any XML reader like this?
by BrowserUk (Pope) on Jan 14, 2012 at 00:24 UTC

    And here's another reason. Once you've fixed your first error, your code prints nothing at all

    my $root = XML::LibXML->load_xml( IO => \*DATA )->documentElement; for my $station ($root->findnodes('servers/*')) { say $station->name; for my $ip ( $station->findnodes('ip') ) { say " ".$ip->textContent; } }

    No values. No errors. Nothing! Nada! Zitch! Zip! Not a jot!

    Why? You'll have to go back and wade through those 32 pages again to work that out!


    With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.

    The start of some sanity?

      The documentation for the Parser is all in one page, not 32. The second error was an XPath error. Fixed. At least they were documented an easy to find. Note that I had as many mistakes in the XML::Simple version first.

        Documentation for the parser? Which one is that if I may ask? perldoc XML::LibXML prints something that definitely doesn't fit a single page and ... doesn't tell me anything of use. perldoc XML::LibXML::Parser doesn't fit on a single page either and ... OK, it explains how I turn the XML into a maze of interconnected objects. It says that I can use it to parse not only XML, but also HTML and SGML and god knows what else. It doesn't say anything about what the heck am I supposed to do with the maze of objects. It doesn't even care to tell me what's the $doc all those tons of methods return. Nada, nil, nothing. So back to the XML::LibXML docs ... OK, XML::LibXML::Document looks promising, perldoc XML::LibXML::Document. 43 methods (if I did not loose count somewhere in the muddle). Maybe more because maybe the object inherits methods from some other but ... no info about that in the docs either ... basically looking at the docs seems the only way to get at the data is to call the ->documentElement() and then ... OK, maybe what I get is a XML::LibXML::Element or something. Or maybe I could call ->getElementsByTagName() but ... "Implements the DOM Level 2 function" now what the fsck is that? You call that documentation?

        And XPath? Where's that even just mentioned in the docs? Oh, I see. The start of the description:

        This module is an interface to the gnome libxml2 DOM and SAX parser and the DOM tree. It also provides an XML::XPath-like findnodes() interface, providing access to the XPath API in libxml2.
        So what docs am I supposed to find and read? If anything I just installed a Perl module XML::LibXML, not some, what was it, gnomio&juliet libxml2. And a XML::XPath-like? How much alike? Where do I find the docs? So am I supposed to use XML::XPath instead or what?

        No guys, really, I did look at XML::LibXML a few years ago and tried to learn how to use it, but hey ... I had some work to do, I did not have time to search docs hardly mentioned in these and then other docs and then god only knows what else. Some time later I tried to see what would it take to change XML::Rules to use the universally hailed SAX instead of the good old XML::Parser::Expat, but ... maybe the docs make sense to someone, they did not make any to me.

        For someone who had to learn all that XAnythingYouCanThinkOf and DOM nonsense for other languages, the docs of XML::LibXML may contain all info they need. Or at least enough to get them started. For others it's one of the least helpfull documentation found for a Perl module.

        Jenda
        Enoch was right!
        Enjoy the last years of Rome.

      Yes!
      Nothing is printing!

      Thanks,
      Ashok
Re^4: Is there any XML reader like this? (XML::Simple beats LibXML hands down in the speed stakes!)
by BrowserUk (Pope) on Jan 15, 2012 at 13:23 UTC
    And that's not even mentioning the fact that XML::LibXML is 20x faster

    BTW. Even that factually correct claim only tells half the story. Generate a simple and fairly modest XML file using this:

    #! perl -slw use strict; $|++; our $S //= '999'; our $I //= 10; open O, '>', 'junk.xml'; print O '<servers>'; for my $s ( '0001' .. $S ) { printf "\r%s", $s; print O "<station$s>"; print O '<ip>', join('.', unpack 'C4', pack 'N', int( rand 2**32 ) + ), '</ip>' for 1 .. $I; print O "</station$s>"; }; print O '</servers>'; close O;

    Like this:

    C:\test>xmlgen -S=9999 9999 C:\test>dir junk.xml 15/01/2012 12:40 2,424,205 junk.xml

    Now run XML::Simple & XML::LibXML scripts that parse that file and iterate the contents and time them:

    C:\test>xmllib junk.xml Parsing took 0.290895 seconds Iteration took 171.657306 seconds Total took 171.959000 seconds Check mem:63.6MB C:\test>xmlsimple junk.xml Parsing took 38.202000 seconds Iteration took 0.059186 seconds Total took 38.262577 seconds Check mem:142MB

    All the time you gained during parsing, you throw away four-fold when accessing the data through the nightmare interface of OO baloney.

    And if you double the file size:

    C:\test>xmlgen -S=19999 19999 C:\test>dir junk.xml 15/01/2012 12:58 4,868,440 junk.xml

    And now LibXML takes 8 times as long:

    C:\test>xmllib junk.xml Parsing took 0.560000 seconds Iteration took 676.238758 seconds Total took 676.802000 seconds Check mem:107MB C:\test>xmlsimple junk.xml Parsing took 75.078000 seconds Iteration took 0.124583 seconds Total took 75.209615 seconds Check mem:254MB

    Increase the file size 10-fold and LIbXML will take 100 time longer.

    Now look carefully at the split times. XML::Simple's parsing time is slow, but linear with the file size. It's traversal time is extremely fast and also linear.

    Conversely, LibXML's parsing time is very fast and linear; but it's traversal time is horribly slow and quadratic with the file size.

    It is easy to see which one wins in the speed stakes.


    With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.

    The start of some sanity?

      Not an especially compelling case without posting the source code for the "XML::Simple & XML::LibXML scripts that parse that file and iterate the contents".

        Sorry, they are the same scripts as published earlier in the thread with the addition of a couple of timing points.

        But here ya go. Using LibXML:

        #! perl -slw use strict; use Data::Dump qw[ pp ]; use Time::HiRes qw[ time ]; use XML::LibXML; open XML, '<', $ARGV[0] or die $!; my $start = time; my $root = XML::LibXML->load_xml( IO => \*XML )->documentElement; printf "Parsing took %.6f seconds\n", time - $start; my $start2 = time; for my $station ($root->findnodes('*')) { my $x = $station->nodeName; for my $ip ( $station->findnodes('ip') ) { $x = $ip->textContent; } } printf "Iteration took %.6f seconds\n", time - $start2; printf "Total took %.6f seconds\n", time - $start; printf 'Check mem:'; <STDIN>;

        And XML::Simple:

        #! perl -slw use strict; use Data::Dump qw[ pp ]; use Time::HiRes qw[ time ]; use XML::Simple; open XML, '<', $ARGV[0] or die $!; my $start = time; my $stations = XMLin( \*XML, ForceArray => [ 'ip'], NoAttr => 1 ); printf "Parsing took %.6f seconds\n", time - $start; my $start2 = time; for my $station ( keys %$stations ) { my $x = $station; for my $ip ( @{ $stations->{ $station }{ip} } ) { $x = $ip; } } printf "Iteration took %.6f seconds\n", time - $start2; printf "Total took %.6f seconds\n", time - $start; printf 'Check mem:'; <STDIN>;

        With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
        Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
        "Science is about questioning the status quo. Questioning authority".
        In the absence of evidence, opinion is indistinguishable from prejudice.

        The start of some sanity?

      It is easy to see which one wins in the speed stakes.

      Yeah, LibXML. My tests *included* the time it took to extract the data from the tree. The test was done with real world data of various size from three different providers.

      We use XML::Bare with a thin layer to compensate for it's awful interface (XML::Simple without ForceArray or any other option), its expectation of getting decoded text, and it's lack of namespace support. It's slightly faster when you factor in the time it takes to extract data. Not nearly as capable as libxml, and we had to create an interface just to be able to use it.

        Yeah, LibXML. My tests *included* the time it took to extract the data from the tree.

        Hm. So did mine. But I believe mine.

        We use XML::Bare with a thin layer to compensate for it's awful interface (XML::Simple without ForceArray or any other option)

        Hm. XML::Bare::forcearray( [noderef] )

        S'funny init. It took less than a minute to disprove that. And after 5 minutes, I'm pretty sure I could use XML::Bare to read a file and get access to its content.

        Conversely, when I tried to look up getDocumentElement, I completely crapped out after about an hour. You applied it to the return from load_xml() which is labelled $dom. So look in DOM. Nada. Maybe a Node. Nada. How about a parser, or a nodelist or a namespace? Nada, nada, nada!

        Your idea of an "awful interface" is weird.

        For me:

        • the best interface is the one I don't have to lookup more than once.

          That means small.

        • The second best interface is one that makes it easy to lookup what I need to know.

          That means the first page shows me enough to get something working.

          Details, refinements and esoterica can be deferred to secondary pages if that cannot be avoided.

        • The third best interface is one that if it has to be large, is logically grouped.

          That means, it starts by splitting the documentation along vertical lines. Ie. The way people need to use the interface. Eg, Read an XML; or write an XML; or edit an XML. etc. Not horizontally according to some arbitrary way the author decided to structure his code.

          And it means starting with the basics in the root document, in the form of simple -- but complete -- worked examples of the main modes of use. And leaving the esoteric details for (preferably linked (and links that actually work)) secondary pages.

          Not hitting the user in the face with a top level synopsis that contain every possible variation of the constructor and no indication of where to go from there.

        XML::LibXML fails on every count.

        Can we stop now, because we are once again doing nothing to help the OP; nor each other.


        With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
        Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
        "Science is about questioning the status quo. Questioning authority".
        In the absence of evidence, opinion is indistinguishable from prejudice.

        The start of some sanity?

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://947849]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others perusing the Monastery: (9)
As of 2014-12-20 20:30 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    Is guessing a good strategy for surviving in the IT business?





    Results (98 votes), past polls