Beefy Boxes and Bandwidth Generously Provided by pair Networks
P is for Practical
 
PerlMonks  

XML data reading/output

by Anonymous Monk
on Sep 04, 2009 at 18:34 UTC ( #793543=perlquestion: print w/ replies, xml ) Need Help??
Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

I need to review a lot of xml files and find the values of tag <mainterm type=??? > when the the '<descriptors> value is "CTC".

See one record in my xml file:


<item>

<bibrecord>

<item-info>

<copyright type="Els">Copyright 2007 AAA, All rights reserved.</copyright>

</item-info>

<head>

<citation-info>

<citation-type code="ar"/>

</citation-info>

<descriptorgroup>

<descriptors controlled="y" type="CCV">

<descriptor>

<mainterm weight="a">Corrosion protection</mainterm>

</descriptor>

</descriptors>

<descriptors controlled="y" type="CMH">
<br<descriptor>

<mainterm weight="a">Corrosion inhibitors</mainterm>

</descriptor>

</descriptors>

<descriptors controlled="y" type="CTC">

<descriptor>

<mainterm weight="a">G</mainterm>

</descriptor>

</descriptors>

</descriptorgroup>

</enhancement>

</head>

</bibrecord>

</item>

The output should be:

'<mainterm weight="a">G</mainterm>'

That means only when '<descriptors type="CTC"' then print the '<mainterm>' value.

Can you help me?

Comment on XML data reading/output
Re: XML data reading/output
by ikegami (Pope) on Sep 04, 2009 at 19:11 UTC
    In other words, you want to dump the elements that match XPath
    //descriptors[@type="CTC"]/descriptor/mainterm

    Using XML::LibXML, the syntax is something like

    for my $mainterm ( $doc->findnodes( '//descriptors[@type="CTC"]/descriptor/mainterm' +) ) { print $mainterm->toString(); }

    XML::Twig would also be awesome here.

    Update: Changed
    //descriptors[type="CTC"]/mainterm
    to
    //descriptors[@type="CTC"]/descriptor/mainterm
    as per reply.

      I got better results with
      $doc->findnodes( '//descriptors[@type="CTC"]/descriptor/mainterm/text( +)'
        The three changes I added are:
      • an @ for the type attribute
      • the /descriptor element (although the xml is not well formed)
      • /text() since it appears they desire a text node
        arg, yeah, dumb mistakes. But not the third one. Contrary to your claims, the OP wants the whole element ('<mainterm weight="a">G</mainterm>'), not just the text ('G').
Re: XML data reading/output
by arun_kom (Monk) on Sep 04, 2009 at 19:12 UTC
    Show us your code ... what have you tried till now and where did you get stuck?
Re: XML data reading/output
by ramrod (Chaplain) on Sep 04, 2009 at 19:14 UTC
Re: XML data reading/output
by toolic (Bishop) on Sep 04, 2009 at 19:16 UTC
    XML::Twig can help you:
    use strict; use warnings; use XML::Twig; my $xfile = shift; my $t = new XML::Twig( twig_handlers => {descriptors => \&desc} ); $t->parsefile($xfile); sub desc { my ($twig, $desc) = @_; if ($desc->att('type') eq 'CTC') { $desc->first_child('descriptor')->first_child('mainterm')->pri +nt(); } }

    Please post valid XML.

      Thanks so much for the help.<\p>
      My xml file is too big to run. It has out of memory error now. My file size is about 1G.
      Please help.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://793543]
Approved by broomduster
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others taking refuge in the Monastery: (18)
As of 2015-07-01 15:44 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    The top three priorities of my open tasks are (in descending order of likelihood to be worked on) ...









    Results (6 votes), past polls