Beefy Boxes and Bandwidth Generously Provided by pair Networks
Problems? Is your data what you think it is?
 
PerlMonks  

searching the XML file for certain nodes

by Anonymous Monk
on Mar 10, 2009 at 13:21 UTC ( #749590=perlquestion: print w/ replies, xml ) Need Help??
Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

dear monks I'm confused about the searching function in working with XML. I have an xml file like this:
<?xml version="1.0" encoding="UTF-8" ?> <e> <p id="1" v="YES"> <1>The</1> <2>Bye</2> </p> <p id="2" v="NO"> <1>Border</1> <2>Lamp</2> </p> </e>
I want to have only the ones in which v="YES". how can I search for that in order to have this as output:
<code> <?xml version="1.0" encoding="UTF-8" ?> <e> <p id="1" v="YES"> <1>The</1> <2>Bye</2> </p> </e>

Comment on searching the XML file for certain nodes
Select or Download Code
Re: searching the XML file for certain nodes
by dHarry (Abbot) on Mar 10, 2009 at 13:29 UTC

    Use an XPATH expression. (Or XQUERY if you want to go over the top.) What Perl module do you use?

      I've used XPATH before but just for extracting info from xml file. how shall i state this in perl :
      doc("file.xml")/e/p[V="YES"]
        I've used XPATH before but just for extracting info from xml file

        Well that's what you want don't you? See post of ramrod below for the xpath syntax. If you want to transform your XML document into another one then this cries out for XSLT (in which case you also use xpath expressions:).

Re: searching the XML file for certain nodes
by ikegami (Pope) on Mar 10, 2009 at 14:05 UTC
    XML::Twig is particularly well suited to do this transformation. Sorry, I don't have time to come up with the code right now.

    Update: Contrary to claims made, that's not XML. 1 and 2 are not valid XML element names. If that isn't what you are actually using, the following code will do the trick:

    use strict; use warnings; use XML::Twig; my $t = XML::Twig->new( twig_handlers => { 'p[@v!="YES"]' => sub { $_->delete }, }, ); $t->parse(\*DATA); # $t->parsefile( "file.xml"); $t->flush(); __DATA__ <?xml version="1.0" encoding="UTF-8" ?> <e> <p id="1" v="YES"> <c>The</c> <c>Bye</c> </p> <p id="2" v="NO"> <c>Border</c> <c>Lamp</c> </p> </e>
Re: searching the XML file for certain nodes
by ramrod (Hermit) on Mar 10, 2009 at 14:13 UTC
    If you wanted to use XML::libXML then the following would point you toward the node you want:
    my $parser = XML::LibXML->new(); my $doc = $parser->parse_file($data); my ($object) = $doc->findnodes("\/e\/p\[\@v=\"YES\"\]");
    Then you can manipulate the rest all you want with the other functions (I would start with childnodes)
    Hope this helps!
    NOTE: I chose the double quotes in case you wanted to replace YES with a variable. Maybe . . .
    my ($object) = $doc->findnodes("\/e\/p\[\@v=\"$criteria\"\]");
Re: searching the XML file for certain nodes
by mirod (Canon) on Mar 10, 2009 at 14:40 UTC

    You can't do that with an XML processor. Your XML is not well-formed.

    If you fix your XML, then you can use something like this to filter out the p's you don't want:

    #!/usr/bin/perl use strict; use warnings; use XML::Twig; XML::Twig->new( twig_handlers => { 'p[@v="NO"]' => sub { $_->cut; } }, pretty_print => 'indented', ) ->parsefile( "myfile.xml") ->print;
Re: searching the XML file for certain nodes
by Jenda (Abbot) on Mar 10, 2009 at 15:32 UTC
    use strict; use XML::Rules; my $filter = XML::Rules->new( style => 'filter', rules => { _default => 'raw', p => sub { my ($tag, $attr) = @_; if ($attr->{v} eq 'YES') { return $tag => $attr } else { return; } }, }, ); $filter->filter(\*DATA); __DATA__ <?xml version="1.0" encoding="UTF-8" ?> <e> <p id="1" v="YES"> <e1>The</e1> <e2>Bye</e2> </p> <p id="2" v="NO"> <e1>Border</e1> <e2>Lamp</e2> </p> </e>

    Unlike the XML::LibXML and the presented XML::Twig solutions, this one doesn't keep the whole (original or resulting) document in memory, instead it only stores the contents of one <p> at any time. If the file is huge and contains a lot of small <p> tags, this may make a big difference.

      Hey, I can do low-memory (and cryptic!) too. And I'll raise you with a one-liner:

      perl -MXML::Twig -e'XML::Twig->parse( twig_roots => { q{p[@v="NO"]} => 1 }, twig_print_outside_roots => 1, shift)' myfile.xml

      Let's play XML golf! (in which case I'll have to release a special golf-edition of XML::Twig, with shorter option names ;--)

        I never said XML::Twig can't do it without keeping all the stuff in memory. All I said was that the solutions presented so far do keep the whole filtered data in memory :-)

        I can't complete with that script in the number of characters. Unless I make changes in the module. Whether it's cryptic depends on the beholder. It ain't for me, but who am I to tell, I'm the module author :-).

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://749590]
Approved by ikegami
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others studying the Monastery: (16)
As of 2014-10-30 18:38 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    For retirement, I am banking on:










    Results (208 votes), past polls