http://www.perlmonks.org?node_id=1055248


in reply to Parse multiple xml tags with different names to an array

See my responses in The State of Web spidering in Perl where I link examples

Replies are listed 'Best First'.
Re^2: Parse multiple xml tags with different names to an array
by rbala (Acolyte) on Sep 23, 2013 at 09:07 UTC
    Hi, Thank you, but I am unable to figure out my exact requirement from a huge pool of info there in "state of web spidering in perl". Can you specify exactly which matches my req ? Thanks, Bala.

      Personally, I would look at using XML::LibXML and XPath queries to retrieve the child nodes. XML::Twig is also a very similar and capable approach, especially if your XML documents do not fit into available memory.

        Thanks for ur comments :)
        Hi ,

        With all your help , I am able to get the child nodes as an array using the code below :

        my $parser = XML::LibXML->new(); my $doc = $parser->parse_string($string); my @nodes = $doc->findnodes("//ROOT_TAG/*"); Is there any possible ways to do this ?

        The resultant array elements of @nodes are all XML::LibXML::Elements object. I need this to be hash for my processing. Is there any way to do this ?

        Thanks,

        Bala.

Re^2: Parse multiple xml tags with different names to an array
by rbala (Acolyte) on Sep 26, 2013 at 07:13 UTC
    The problem is solved. The following is the code:
    use XML::LibXML; use XML::Simple; my $xml_simple = new XML::Simple( SuppressEmpty => 1, KeyAttr => "INDEX", ForceArray => 1, KeepRoot => 1, ); my @nodeblocks; my $parser = XML::LibXML->new(); my $doc = $parser->parse_string($string); my @nodes = $doc->findnodes("//ROOT_TAG/*"); foreach my $node(@nodes) { my $str = $node->tostring(); my $str_hash = $xml_simple->XMLin($str); push(@nodeblocks, $str_hash); }
      So you're after this?
      my @nodeblocks = ( { CHILD1 => ["ABC"] }, { CHILD2 => ["KJLK"] }, { CHILD3 => ["NLLKJ"] }, );

      #!/usr/bin/perl -- use strict; use warnings; use XML::LibXML 1.70; ## for load_html/load_xml/location use Data::Dump; my $xml=q{<ROOT_TAG> <CHILD1>ABC</CHILD1> <CHILD2>KJLK</CHILD2> <CHILD3>NLLKJ</CHILD3> </ROOT_TAG>}; my $dom = XML::LibXML->new( qw/ recover 2 / )->load_xml( string => $xml ); my @nodeboks; for my $kid ( $dom->findnodes(q{ //ROOT_TAG/* } ) ){ print $kid->nodePath, "\n"; push @nodeboks, { $kid->tagName => [ $kid->textContent ], }; } dd\@nodeboks; __END__ /ROOT_TAG/CHILD1 /ROOT_TAG/CHILD2 /ROOT_TAG/CHILD3 [ { CHILD1 => ["ABC"] }, { CHILD2 => ["KJLK"] }, { CHILD3 => ["NLLKJ"] }, ]

      If your real problem is this short, this approach isn't too bad, but if its more comples, its not a substitute for what XML::Simple does for you, but XML::Rules is, its better than XML::Simple

Re^2: Parse multiple xml tags with different names to an array
by rbala (Acolyte) on Sep 25, 2013 at 11:18 UTC
    Hi , I checked ur exapmles, but could not figure out a way . My requirement is simple. Just convert the XML::LibXML::Element to a plain string value. Can you help me out specifically ? Thanks, Bala.