Beefy Boxes and Bandwidth Generously Provided by pair Networks
Pathologically Eclectic Rubbish Lister
 
PerlMonks  

How can I browse & list XPATH of a XML Message?

by MDRI (Initiate)
on Oct 30, 2015 at 02:36 UTC ( [id://1146437] : perlquestion . print w/replies, xml ) Need Help??

MDRI has asked for the wisdom of the Perl Monks concerning the following question:

Thanks for looking into this issue. I am not sure, whether this is the right forum to post this thread. If not, let me know the right forum to post this thread.

We have a complex XML Message (data in XML format). We are exploring a way to extract all the XPATHs of this XML message and its element/attribute level data content. We tried with XMLSPY, & xmltwig, but no luck. Xml_grep pulls data, if we give XPATH input. There is no option in xml_grep to browse all XPATHS of a XML message.

I have well-formed XML message. I want to produce a list/report as

1) All Xpath of XML message (Browse all XPATH and list of XML message)

2) Xpath , data content for this XPATH (Browse all XPATH, data content and list both of XML message)

Here is an example (Input XML Message)

<?xml version="1.0"?> <PARTS> <TITLE>Computer Parts</TITLE> <PART> <ITEM>Motherboard</ITEM> <MANUFACTURER>ASUS</MANUFACTURER> <MODEL>P3B-F</MODEL> <COST> 123.00</COST> </PART> <PART> <ITEM>Video Card</ITEM> <MANUFACTURER>ATI</MANUFACTURER> <MODEL>All-in-Wonder Pro</MODEL> <COST> 160.00</COST> </PART> <PART> <ITEM>Sound Card</ITEM> <MANUFACTURER>Creative Labs</MANUFACTURER> <MODEL>Sound Blaster Live</MODEL> <COST> 80.00</COST> </PART> <PART> <ITEM>inch Monitor</ITEM> <MANUFACTURER>LG Electronics</MANUFACTURER> <MODEL> 995E</MODEL> <COST> 290.00</COST> </PART> </PARTS>
The desired output --> I created the following XML list manually
/PARTS/TITLE Computer Parts /PARTS/PART[1]/ITEM Motherboard /PARTS/PART[1]/MANUFACTURER ASUS /PARTS/PART[1]/MODEL P3B-F /PARTS/PART[1]/COST 123.00 /PARTS/PART[2]/ITEM Video Card /PARTS/PART[2]/MANUFACTURER ATI ............ .............. .................. ...................
Are there any open source product to produce such report for XML Message?

What are the ways to extract XPATHs/XPATH, data content?

Thanks for allowing to pick the brain of this forum.

Replies are listed 'Best First'.
Re: How can I browse & list XPATH of a XML Message?
by choroba (Cardinal) on Oct 30, 2015 at 03:01 UTC
    Pretty easy in XML::XSH2, a wrapper around XML::LibXML:
    open file.xml ; for ( /descendant-or-self::node() | //@* ) echo xsh:path(.) (.) ;

    To skip double reports for text() nodes and non-leaf nodes, you can add

    [not(./*)][not(self::text())]

    after the node().

    لսႽ ᥲᥒ⚪⟊Ⴙᘓᖇ Ꮅᘓᖇ⎱ Ⴙᥲ𝇋ƙᘓᖇ

       [not(./*)][not(self::text())]

      thats xsh syntax? what is equivalent xpath syntax?

        No, that's standard XPath syntax. What made you think it's not?
        لսႽ ᥲᥒ⚪⟊Ⴙᘓᖇ Ꮅᘓᖇ⎱ Ⴙᥲ𝇋ƙᘓᖇ
Re: How can I browse & list XPATH of a XML Message? (xpaths dumper config)
by Anonymous Monk on Oct 30, 2015 at 02:49 UTC
Re: How can I browse & list XPATH of a XML Message?
by Preceptor (Deacon) on Oct 30, 2015 at 10:45 UTC

    I got as far as this using "XML::Twig":

    #!/usr/bin/env perl use strict; use warnings; use Data::Dumper; use XML::Twig; my $twig = XML::Twig->parse( \*DATA ); print $twig ->get_xpath( '/root/fish/carrot[@colour="orange"]/pie', 0 +)->text, "\n"; foreach my $node ( $twig->get_xpath('//*') ) { my @path_tags; my @path_with_att; my $cursor = $node; while ($cursor) { unshift( @path_tags, $cursor->tag ); my $att_path = ""; if ( $cursor->atts ) { $att_path = join( "", map { "[@" . $_ . "=\"" . $cursor->att($_) . "\"]" } keys %{ $cursor->atts } ); } unshift( @path_with_att, $cursor->tag . $att_path ); $cursor = $cursor->parent; } print join( "/", @path_tags ), "\n"; my $xpath_with_atts = "/" . join( "/", @path_with_att ); print $xpath_with_atts, "\n"; print "Found:", $twig->get_xpath( $xpath_with_atts, 0 )->tag, "\n" +; } __DATA__ <root> <fish skin="scaly" home="pond"> <carrot colour="orange"> <pie>This value</pie> </carrot> </fish> </root>

    It doesn't quite do what you want, unfortunately, as what it doesn't do is give you the numeric index of the node, and so whilst the XPATH is valid, it's not necessarily unique.

    I'm going to try and think if there's a good way to do what you're wanting.

      I suggest that you use the ->xpath method of the elements to find the path together with the numeric indexes. Eg.

      my $xpath_with_atts = $node->xpath;
Re: How can I browse & list XPATH of a XML Message?
by tangent (Parson) on Oct 30, 2015 at 21:29 UTC
    Have a look at XML::LibXML::Iterator which, with a bit of trial and error, I got to produce your desired output:
    use XML::LibXML; use XML::LibXML::Iterator; my $string = q|<?xml version="1.0"?> <PARTS> <TITLE>Computer Parts</TITLE> .... </PARTS>|; my $doc = XML::LibXML->new->parse_string( $string ); my $iterator = XML::LibXML::Iterator->new( $doc ); my ($path,$value) = ('',''); while ( $iterator->nextNode ) { if ($path and $value) { print "$path $value\n"; ($path,$value) = ('',''); } my $current = $iterator->current; my $type = $current->nodeType; if ( $type == XML_ELEMENT_NODE ) { $path = $current->nodePath; } elsif ( $type == XML_TEXT_NODE ) { my $text = $current->nodeValue; chomp $text; $value = $text if $text; } }
    Output:
    /PARTS/TITLE Computer Parts /PARTS/PART[1]/ITEM Motherboard /PARTS/PART[1]/MANUFACTURER ASUS /PARTS/PART[1]/MODEL P3B-F /PARTS/PART[1]/COST 123.00 /PARTS/PART[2]/ITEM Video Card /PARTS/PART[2]/MANUFACTURER ATI /PARTS/PART[2]/MODEL All-in-Wonder Pro /PARTS/PART[2]/COST 160.00 /PARTS/PART[3]/ITEM Sound Card /PARTS/PART[3]/MANUFACTURER Creative Labs /PARTS/PART[3]/MODEL Sound Blaster Live /PARTS/PART[3]/COST 80.00 /PARTS/PART[4]/ITEM inch Monitor /PARTS/PART[4]/MANUFACTURER LG Electronics /PARTS/PART[4]/MODEL 995E /PARTS/PART[4]/COST 290.00