Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl Monk, Perl Meditation
 
PerlMonks  

Re: Is there any XML reader like this?

by tobyink (Abbot)
on Jan 13, 2012 at 22:32 UTC ( #947845=note: print w/ replies, xml ) Need Help??


in reply to Is there any XML reader like this?

As always, I strongly recommend against XML::Simple. XML::Simple might seem simple until you end up in a situation where one of your stations has only a single IP address, and you end up with:

 {
   servers => {
     station19 => {ip=>['10.10.10.1','10.10.10.2']},
     station20 => {ip=>['10.10.10.3','10.10.10.4']},
     station21 => {ip=>'10.10.10.5'}, # D'oh!
   }
 }

Notice that $hash->{servers}{station21}{ip} is not an arrayref, whereas the IP list is an arrayref for every other station.

OK, so you can configure XML::Simple to force the IP addresses to always be arrayrefs, but by the time you've thought through every possible permutation of your data, XML::Simple becomes not so simple any more.

Better to use a more powerful XML module, like XML::LibXML, which might seem more complicated to begin with, but is at least consistent.

use XML::LibXML; my $xml = XML::LibXML->new->parse_fh(\*DATA); foreach my $station ($xml->findnodes('/servers/*')) { printf("Station: %s\n", $station->tagName); foreach my $ip ($station->findnodes('./ip')) { printf("\tIP: %s\n", $ip->textContent); } } __DATA__ <servers> <station18> <ip>10.0.0.101</ip> <ip>10.0.1.101</ip> <ip>10.0.0.102</ip> <ip>10.0.0.103</ip> <ip>10.0.1.103</ip> </station18> <station19> <ip>10.0.0.111</ip> <ip>10.0.1.111</ip> <ip>10.0.0.112</ip> <ip>10.0.0.113</ip> <ip>10.0.1.113</ip> </station19> <station17> <ip>10.0.0.121</ip> <ip>10.0.1.121</ip> <ip>10.0.0.122</ip> <ip>10.0.0.123</ip> <ip>10.0.1.123</ip> </station17> <station20> <!-- no IP addresses --> </station20> <station21> <!-- just one IP address --> <ip>10.2.1.123</ip> </station21> </servers>


Comment on Re: Is there any XML reader like this?
Select or Download Code
Re^2: Is there any XML reader like this?
by BrowserUk (Pope) on Jan 13, 2012 at 23:06 UTC

    Sorry, but that is just so much BS. You simply need to add one simple option:

    C:\test>junk44 #! perl -slw use strict; use Data::Dump qw[ pp ]; use XML::Simple; my $xml = XMLin( \*DATA, ForceArray => 1 ); pp $xml; __DATA__ <servers> <station18> <ip>10.0.0.101</ip> <ip>10.0.1.101</ip> <ip>10.0.0.102</ip> <ip>10.0.0.103</ip> <ip>10.0.1.103</ip> </station18> <station19> <ip>10.0.0.111</ip> <ip>10.0.1.111</ip> <ip>10.0.0.112</ip> <ip>10.0.0.113</ip> <ip>10.0.1.113</ip> </station19> <station17> <ip>10.0.0.121</ip> </station17> </servers>

    Produces:

    { station17 => [{ ip => ["10.0.0.121"] }], station18 => [ { ip => [ "10.0.0.101", "10.0.1.101", "10.0.0.102", "10.0.0.103", "10.0.1.103", ], }, ], station19 => [ { ip => [ "10.0.0.111", "10.0.1.111", "10.0.0.112", "10.0.0.113", "10.0.1.113", ], }, ], }

    Which is still far simpler than wasting your time trying to figure out how use those complex monsters.


    With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.

    The start of some sanity?

      I'm have no idea why you call XML::LibXML a monster compared to XML::Simple.

      use XML::Simple qw( :strict XMLin ); local $XML::Simple::PREFERRED_PARSER = 'XML::Parser'; my $stations = XMLin( \*DATA, ForceArray => 1, KeyAttr => [] ); for my $station_name (keys %$stations) { say $station_name; my $station = $stations->{$station_name}[0]; for my $ip (@{ $station->{ips} // [] }) { say " $ip"; } }
      use XML::LibXML qw( ); my $root = XML::LibXML->load_xml( IO => \*DATA )->documentElement; for my $station ($root->findnodes('*')) { say $station->getName; for my $ip ($station->findnodes('ip')) { say " ".$ip->textContent; } }

      And that's not even mentioning the fact that XML::LibXML is 20x faster* and able to handle so much more stuff than XML::Simple (including every day stuff).

      * — That assumes XML::Parser is used as XML::Simple's backend. XML::LibXML is 10,000x faster than XML::Simple's common default of XML::SAX::PurePerl (which handles encodings really badly).

      Update: Fixed an error in XML::Simple code.
      Update: Fixed an error in XML::LibXML code. ("IO" was mispelled, and the XPath was wrong.)

        I'm have no idea why you call XML::LibXML a monster compared to XML::Simple.

        Here's one reason:

        XML::LibXML->load: specify location, string, or IO at C:\test\xml1.pl +line 7

        This is line 7:

        my $root = XML::LibXML->load_xml( fh => \*DATA )->documentElement;

        So now you've got to wade through the 32 separate pages of XML::LibXML POD to work out why!

        I never have that problem with XML::Simple.


        With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
        Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
        "Science is about questioning the status quo. Questioning authority".
        In the absence of evidence, opinion is indistinguishable from prejudice.

        The start of some sanity?

        And here's another reason. Once you've fixed your first error, your code prints nothing at all

        my $root = XML::LibXML->load_xml( IO => \*DATA )->documentElement; for my $station ($root->findnodes('servers/*')) { say $station->name; for my $ip ( $station->findnodes('ip') ) { say " ".$ip->textContent; } }

        No values. No errors. Nothing! Nada! Zitch! Zip! Not a jot!

        Why? You'll have to go back and wade through those 32 pages again to work that out!


        With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
        Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
        "Science is about questioning the status quo. Questioning authority".
        In the absence of evidence, opinion is indistinguishable from prejudice.

        The start of some sanity?

        And that's not even mentioning the fact that XML::LibXML is 20x faster

        BTW. Even that factually correct claim only tells half the story. Generate a simple and fairly modest XML file using this:

        #! perl -slw use strict; $|++; our $S //= '999'; our $I //= 10; open O, '>', 'junk.xml'; print O '<servers>'; for my $s ( '0001' .. $S ) { printf "\r%s", $s; print O "<station$s>"; print O '<ip>', join('.', unpack 'C4', pack 'N', int( rand 2**32 ) + ), '</ip>' for 1 .. $I; print O "</station$s>"; }; print O '</servers>'; close O;

        Like this:

        C:\test>xmlgen -S=9999 9999 C:\test>dir junk.xml 15/01/2012 12:40 2,424,205 junk.xml

        Now run XML::Simple & XML::LibXML scripts that parse that file and iterate the contents and time them:

        C:\test>xmllib junk.xml Parsing took 0.290895 seconds Iteration took 171.657306 seconds Total took 171.959000 seconds Check mem:63.6MB C:\test>xmlsimple junk.xml Parsing took 38.202000 seconds Iteration took 0.059186 seconds Total took 38.262577 seconds Check mem:142MB

        All the time you gained during parsing, you throw away four-fold when accessing the data through the nightmare interface of OO baloney.

        And if you double the file size:

        C:\test>xmlgen -S=19999 19999 C:\test>dir junk.xml 15/01/2012 12:58 4,868,440 junk.xml

        And now LibXML takes 8 times as long:

        C:\test>xmllib junk.xml Parsing took 0.560000 seconds Iteration took 676.238758 seconds Total took 676.802000 seconds Check mem:107MB C:\test>xmlsimple junk.xml Parsing took 75.078000 seconds Iteration took 0.124583 seconds Total took 75.209615 seconds Check mem:254MB

        Increase the file size 10-fold and LIbXML will take 100 time longer.

        Now look carefully at the split times. XML::Simple's parsing time is slow, but linear with the file size. It's traversal time is extremely fast and also linear.

        Conversely, LibXML's parsing time is very fast and linear; but it's traversal time is horribly slow and quadratic with the file size.

        It is easy to see which one wins in the speed stakes.


        With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
        Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
        "Science is about questioning the status quo. Questioning authority".
        In the absence of evidence, opinion is indistinguishable from prejudice.

        The start of some sanity?

      You simply need to add one simple option

      And that helps you for precisely five minutes until someone adds this to the file:

        <ip assignment="temporary">10.0.0.101</ip>
      

      And then all your code which assumes stations have IP addresses which are arrayrefs of strings breaks again.

        then all your code which assumes stations have IP addresses which are arrayrefs of strings breaks again.

        Nope. This:

        #! perl -slw use strict; use Data::Dump qw[ pp ]; use XML::Simple; my $xml = XMLin( \*DATA, ForceArray => [ 'ip' ], NoAttr => 1 ); pp $xml; __DATA__ <servers> <station18> <ip>10.0.0.101</ip> <ip>10.0.1.101</ip> <ip>10.0.0.102</ip> <ip>10.0.0.103</ip> <ip>10.0.1.103</ip> </station18> <station19> <ip>10.0.0.111</ip> <ip>10.0.1.111</ip> <ip>10.0.0.112</ip> <ip>10.0.0.113</ip> <ip>10.0.1.113</ip> </station19> <station17> <ip assignment="temporary">10.0.0.101</ip> <ip>10.0.0.121</ip> </station17> </servers>

        Produces this::

        C:\test>junk44 { station17 => { ip => ["10.0.0.101", "10.0.0.121"] }, station18 => { ip => [ "10.0.0.101", "10.0.1.101", "10.0.0.102", "10.0.0.103", "10.0.1.103", ], }, station19 => { ip => [ "10.0.0.111", "10.0.1.111", "10.0.0.112", "10.0.0.113", "10.0.1.113", ], }, }

        With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
        Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
        "Science is about questioning the status quo. Questioning authority".
        In the absence of evidence, opinion is indistinguishable from prejudice.

        The start of some sanity?

        Which may very well be the right thing to do. The format of the data changed, is it really safe to snip out the bits we did expect and ignore the rest?

        Jenda
        Enoch was right!
        Enjoy the last years of Rome.

        Getting the following errorNot an ARRAY reference at ProcessStatus.pl line 44. when I have this option that will have stations mixed like:
        <station20> <user>netcool</user> <process assignment="temporary">some text</process> </station20> <station19> <user>netcool</user> <process>nco_objserv</process> <process>nco_p_mttrapd</process> </station19>

        Code snippet....
        my $xml = XMLin("PROCESS.CONF"); ...... foreach $process ( @{ $xml->{$server}{process} } ) .....
        Anything missing here?

        Thanks,
        Ashok

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://947845]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others pondering the Monastery: (5)
As of 2014-09-01 18:48 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    My favorite cookbook is:










    Results (15 votes), past polls