http://www.perlmonks.org?node_id=1042546

madbee has asked for the wisdom of the Perl Monks concerning the following question:

Hello! I'm trying to parse an XML file, sample of which is below.

<Aritcle> <Main> <Sect> <H4>Include</H4> ..... <P1> This is the criteria</P1> <L> <LI> <LI_Label>1.</LI_Label> <LI_Title>Critera 1</LI_Title> </LI> <LI> <LI_Label>2.</LI_Label> <LI_Title>Critera 2</LI_Title> </LI> <LI> <LI_Label>3.</LI_Label> <LI_Title>Critera 3</LI_Title> </LI> <LI> <LI_Label>4.</LI_Label> <LI_Title>Critera 3</LI_Title> </LI> </Sect> </Main> </Article>

I am using: XML::LibXML for parsing this. I can easily get the entire section of lists. However, my problem is I just need to know how many LIST elements are there. One way I was thinking is:

1. Parse the xml for the sec4. store contents in array

2. Loop through array until I get the first LIST element. Track counter;

3. Increment counter for each LIST element found until the last LIST element is reached.

While this approach may work, I feel this is very kludgy. So I am looking to see if there is an elegant way I can count the number of list elements in a section

The challenge ofcourse is that not every XML file I am parsing has the exact same structure. There could be variations where the <P1> This is the criteria</P1> may not exist before the start of the list.

Hoping someone here has some thoughts on how best I can capture the count of list elements

Sadly, I cannot use XPath here since I have an entire piece of code built around LibXML parsing. Also I'm having an Xpath installation nightmare which I just cannot get past.

Thanks so much in advance.

Regards,Madbee

Replies are listed 'Best First'.
Re: XML parsing and Lists
by choroba (Cardinal) on Jul 04, 2013 at 23:38 UTC
    Using XML::LibXML without XPath (that is, XML::LibXML::XPathContext is like driving a Jaguar without tyres. Just compare your description to the succinctness of
    count(Sect[4]//LI)
    لսႽ† ᥲᥒ⚪⟊Ⴙᘓᖇ Ꮅᘓᖇ⎱ Ⴙᥲ𝇋ƙᘓᖇ

      Thanks for responding.I was not aware of XPathContext module in LibXML. I was trying to use XML::XPath directly and got into an infinite loop of installation issues which I could not get past.

      I will try using this approach. Basically, I have to create a array of nodes for the path:

      $parser = XML::LibXML->new; $dom = $parser->parse_file($file); $root = $dom->getDocumentElement; $dom->setDocumentElement($root); my $xc = XML::LibXML::XPathContext->new($file); my @nodes=$xc->findnodes('//Article//Part//Sect//H5[ contains(.,"I +nclude")]',$dom); if (@nodes) { $count = $xc->findvalue('count(//Article//Part//Sect//LI)',$dom); print $count; }

      Am I on the right track? Anything I'm missing?

      Thanks much.

        You are overcomplicating the problem. Do not use setDocumentElement, it creates a new root element. The constructor of XPathContext takes a context node as a parameter, not a file. This is a Short, Self Contained, Correct Example:
        #!/usr/bin/perl use warnings; use strict; use XML::LibXML; my $dom = XML::LibXML->load_xml(string => << '__XML__'); <Article> <!-- fixed typo --> <Main> <Sect> <H4>Include</H4> ..... <P1> This is the criteria</P1> <L> <LI> <LI_Label>1.</LI_Label> <LI_Title>Critera 1</LI_Title> </LI> <LI> <LI_Label>2.</LI_Label> <LI_Title>Critera 2</LI_Title> </LI> <LI> <LI_Label>3.</LI_Label> <LI_Title>Critera 3</LI_Title> </LI> <LI> <LI_Label>4.</LI_Label> <LI_Title>Critera 3</LI_Title> </LI> </L> <!-- fixed missing closing tag --> </Sect> </Main> </Article> __XML__ my $xc = XML::LibXML::XPathContext->new; my $count = $xc->findvalue('count(//Article//Sect//LI)', $dom); print "$count list nodes found.\n" if $count;
        لսႽ† ᥲᥒ⚪⟊Ⴙᘓᖇ Ꮅᘓᖇ⎱ Ⴙᥲ𝇋ƙᘓᖇ