Beefy Boxes and Bandwidth Generously Provided by pair Networks
Your skill will accomplish
what the force of many cannot
 
PerlMonks  

XML parsing and Lists

by madbee (Acolyte)
on Jul 04, 2013 at 23:31 UTC ( [id://1042546]=perlquestion: print w/replies, xml ) Need Help??

madbee has asked for the wisdom of the Perl Monks concerning the following question:

Hello! I'm trying to parse an XML file, sample of which is below.

<Aritcle> <Main> <Sect> <H4>Include</H4> ..... <P1> This is the criteria</P1> <L> <LI> <LI_Label>1.</LI_Label> <LI_Title>Critera 1</LI_Title> </LI> <LI> <LI_Label>2.</LI_Label> <LI_Title>Critera 2</LI_Title> </LI> <LI> <LI_Label>3.</LI_Label> <LI_Title>Critera 3</LI_Title> </LI> <LI> <LI_Label>4.</LI_Label> <LI_Title>Critera 3</LI_Title> </LI> </Sect> </Main> </Article>

I am using: XML::LibXML for parsing this. I can easily get the entire section of lists. However, my problem is I just need to know how many LIST elements are there. One way I was thinking is:

1. Parse the xml for the sec4. store contents in array

2. Loop through array until I get the first LIST element. Track counter;

3. Increment counter for each LIST element found until the last LIST element is reached.

While this approach may work, I feel this is very kludgy. So I am looking to see if there is an elegant way I can count the number of list elements in a section

The challenge ofcourse is that not every XML file I am parsing has the exact same structure. There could be variations where the <P1> This is the criteria</P1> may not exist before the start of the list.

Hoping someone here has some thoughts on how best I can capture the count of list elements

Sadly, I cannot use XPath here since I have an entire piece of code built around LibXML parsing. Also I'm having an Xpath installation nightmare which I just cannot get past.

Thanks so much in advance.

Regards,Madbee

Replies are listed 'Best First'.
Re: XML parsing and Lists
by choroba (Cardinal) on Jul 04, 2013 at 23:38 UTC
    Using XML::LibXML without XPath (that is, XML::LibXML::XPathContext is like driving a Jaguar without tyres. Just compare your description to the succinctness of
    count(Sect[4]//LI)
    لսႽ† ᥲᥒ⚪⟊Ⴙᘓᖇ Ꮅᘓᖇ⎱ Ⴙᥲ𝇋ƙᘓᖇ

      Thanks for responding.I was not aware of XPathContext module in LibXML. I was trying to use XML::XPath directly and got into an infinite loop of installation issues which I could not get past.

      I will try using this approach. Basically, I have to create a array of nodes for the path:

      $parser = XML::LibXML->new; $dom = $parser->parse_file($file); $root = $dom->getDocumentElement; $dom->setDocumentElement($root); my $xc = XML::LibXML::XPathContext->new($file); my @nodes=$xc->findnodes('//Article//Part//Sect//H5[ contains(.,"I +nclude")]',$dom); if (@nodes) { $count = $xc->findvalue('count(//Article//Part//Sect//LI)',$dom); print $count; }

      Am I on the right track? Anything I'm missing?

      Thanks much.

        You are overcomplicating the problem. Do not use setDocumentElement, it creates a new root element. The constructor of XPathContext takes a context node as a parameter, not a file. This is a Short, Self Contained, Correct Example:
        #!/usr/bin/perl use warnings; use strict; use XML::LibXML; my $dom = XML::LibXML->load_xml(string => << '__XML__'); <Article> <!-- fixed typo --> <Main> <Sect> <H4>Include</H4> ..... <P1> This is the criteria</P1> <L> <LI> <LI_Label>1.</LI_Label> <LI_Title>Critera 1</LI_Title> </LI> <LI> <LI_Label>2.</LI_Label> <LI_Title>Critera 2</LI_Title> </LI> <LI> <LI_Label>3.</LI_Label> <LI_Title>Critera 3</LI_Title> </LI> <LI> <LI_Label>4.</LI_Label> <LI_Title>Critera 3</LI_Title> </LI> </L> <!-- fixed missing closing tag --> </Sect> </Main> </Article> __XML__ my $xc = XML::LibXML::XPathContext->new; my $count = $xc->findvalue('count(//Article//Sect//LI)', $dom); print "$count list nodes found.\n" if $count;
        لսႽ† ᥲᥒ⚪⟊Ⴙᘓᖇ Ꮅᘓᖇ⎱ Ⴙᥲ𝇋ƙᘓᖇ

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://1042546]
Approved by kcott
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others musing on the Monastery: (7)
As of 2024-03-19 11:48 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found