Beefy Boxes and Bandwidth Generously Provided by pair Networks
Problems? Is your data what you think it is?
 
PerlMonks  

XML::LibXML + XML::LibXML::XPathContext: can this be simplified?

by Darkwing (Sexton)
on Oct 08, 2019 at 13:16 UTC ( #11107200=perlquestion: print w/replies, xml ) Need Help??

Darkwing has asked for the wisdom of the Perl Monks concerning the following question:

Dear monks,

i'm maintaining a perl application (a command line tool using XML::LibXML) where users can write a config file containing rules. If present, then this rules are applied to the input (an xml file) and matches are reported.

Simplified example (foo.xml):

<objects> <obj> <id>1</id> <version>2</version> <refs> <id>2</id> <id>5</id> </refs> </obj> <obj> <id>2</id> <version>2</version> </obj> <obj> <id>3</id> <version>2</version> <refs> <id>2</id> <id>4</id> </refs> </obj> <obj> <id>4</id> <version>2</version> </obj> <obj> <id>5</id> <version></version> </obj> </objects>

Each object in this simplified example has an id and a version and my have a refs referencing other objects by one or more id.

Actually, these rules are basically xpath expressions checking nodes of an object, and i use findnodes() from XML::LibXML to evaluate. But now it is required to also check nodes of referenced objects. For example "The versions of the referenced objects should all be the same as the version of the current object". It seems to me that this cannot be done via xpath and XML::LibXML's findnodes - right?

I finally found the following solution, it uses XML::LibXML::XPathContext custom xpath functions:

use strict; use warnings; use XML::LibXML; use XML::LibXML::XPathContext; sub getObjAll { my ($topList, $actNode, $idNodeList, $expression, $value) = @_; my $top = $topList->[0]; $expression =~ s/:1:/'$value'/g; my $failed; foreach my $idNode (@{$idNodeList}) { my $id = $idNode->textContent; my $nodes = $top->findnodes("/objects/obj[id='$id' and $expression +]"); return unless @{$nodes}; } return $idNodeList; }; my $dom = XML::LibXML->load_xml(location => 'foo.xml'); my $xc = XML::LibXML::XPathContext->new($dom); $xc->registerFunction('getObjAll', \&getObjAll); my @nodes = $xc->findnodes("/objects/obj[getObjAll(/, ., ./refs/id, " . "'./version=:1:', ./version)]"); foreach my $node (@nodes) { print $node->toString(1); print "\n"; }

(in my application, the xpath expression in $xc->findnodes(...) would be taken from the user's config file)

It works, but I find it ugly that one must pass the actual node and the root node. Any way to get around this? Are there other possible improvements?

PS: it would not be practical to change to another xml module since my application consists of many, many classes and <XML::LibXML> is heavily used.

Replies are listed 'Best First'.
Re: XML::LibXML + XML::LibXML::XPathContext: can this be simplified?
by choroba (Bishop) on Oct 08, 2019 at 15:14 UTC
    You might use XML::XSH2. It's not "another xml module", it's a wrapper around XML::LibXML which makes many things possible. For example, you can use its hash command to hash all the versions by object id, and later use its xsh:lookup function in an XPath expression to retrieve an element from the hash.
    open file.xml ; $version := hash ../id /objects/obj/version ; ls /objects/obj[ count(refs/id) and count(refs/id[xsh:lookup('version', .) = ../../ve +rsion]) = count(refs/id) ];

    Update: You can define the same hashing function yourself and solve your problem in the same way:

    #!/usr/bin/perl use warnings; use strict; use feature qw{ say }; use XML::LibXML; use XML::LibXML::XPathContext; { my %hash; sub hash { my ($var, $dom, $key, $value) = @_; for my $node ($dom->findnodes($value)) { $hash{$var}{ $node->findvalue($key) } = $node; } } sub lookup { my ($var, $key) = @_; $hash{$var}{$key} } } my $dom = 'XML::LibXML'->load_xml(...); my $xpc = 'XML::LibXML::XPathContext'->new($dom); $xpc->registerFunction('lookup', \&lookup); hash(version => $dom, '../id', '/objects/obj/version'); my @nodes = $xpc->findnodes( '/objects/obj[ count(refs/id) and count(refs/id[lookup("version", .) = ../../vers +ion]) = count(refs/id) ]'); foreach my $node (@nodes) { say $node->toString(1); }

    map{substr$_->[0],$_->[1]||0,1}[\*||{},3],[[]],[ref qr-1,-,-1],[{}],[sub{}^*ARGV,3]
Re: XML::LibXML + XML::LibXML::XPathContext: can this be simplified?
by Anonymous Monk on Oct 09, 2019 at 01:28 UTC

    It works, but I find it ugly that one must pass the actual node and the root node. Any way to get around this? Are there other possible improvements?

    Sure, see sub XML::LibXML::Node::F

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://11107200]
Front-paged by haukex
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others drinking their drinks and smoking their pipes about the Monastery: (9)
As of 2019-10-14 17:48 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?
    Notices?