Parse multiple xml tags with different names to an array

by rbala (Acolyte)
Hi, Is it possible to extract multiple tags with different names to a single array using any of the perl modules.For example,
I need the resultant array to be with strings of tags like this : @child_array = (<CHILD1>ABC<CHILD1>, <CHILD2>KJLK</CHILD2>,<CHILD3>NLLKJ</CHILD3>); I am able to get this when the names are same for all these child tags. But I want it for different named child tags.

Thanks in Advance, Bala.

Re: Parse multiple xml tags with different names to an array
by Anonymous Monk on Sep 23, 2013 at 08:40 UTC
      Hi, Thank you, but I am unable to figure out my exact requirement from a huge pool of info there in "state of web spidering in perl". Can you specify exactly which matches my req ? Thanks, Bala.

        Personally, I would look at using XML::LibXML and XPath queries to retrieve the child nodes. XML::Twig is also a very similar and capable approach, especially if your XML documents do not fit into available memory.

      The problem is solved. The following is the code:
      use XML::LibXML; use XML::Simple; my $xml_simple = new XML::Simple( SuppressEmpty => 1, KeyAttr => "INDEX", ForceArray => 1, KeepRoot => 1, ); my @nodeblocks; my $parser = XML::LibXML->new(); my $doc = $parser->parse_string($string); my @nodes = $doc->findnodes("//ROOT_TAG/*"); foreach my $node(@nodes) { my $str = $node->tostring(); my $str_hash = $xml_simple->XMLin($str); push(@nodeblocks, $str_hash); }
        So you're after this?
        my @nodeblocks = ( { CHILD1 => ["ABC"] }, { CHILD2 => ["KJLK"] }, { CHILD3 => ["NLLKJ"] }, );

        #!/usr/bin/perl -- use strict; use warnings; use XML::LibXML 1.70; ## for load_html/load_xml/location use Data::Dump; my $xml=q{<ROOT_TAG> <CHILD1>ABC</CHILD1> <CHILD2>KJLK</CHILD2> <CHILD3>NLLKJ</CHILD3> </ROOT_TAG>}; my $dom = XML::LibXML->new( qw/ recover 2 / )->load_xml( string => $xml ); my @nodeboks; for my $kid ( $dom->findnodes(q{ //ROOT_TAG/* } ) ){ print $kid->nodePath, "\n"; push @nodeboks, { $kid->tagName => [ $kid->textContent ], }; } dd\@nodeboks; __END__ /ROOT_TAG/CHILD1 /ROOT_TAG/CHILD2 /ROOT_TAG/CHILD3 [ { CHILD1 => ["ABC"] }, { CHILD2 => ["KJLK"] }, { CHILD3 => ["NLLKJ"] }, ]

        If your real problem is this short, this approach isn't too bad, but if its more comples, its not a substitute for what XML::Simple does for you, but XML::Rules is, its better than XML::Simple

      Hi , I checked ur exapmles, but could not figure out a way . My requirement is simple. Just convert the XML::LibXML::Element to a plain string value. Can you help me out specifically ? Thanks, Bala.

