Beefy Boxes and Bandwidth Generously Provided by pair Networks
Pathologically Eclectic Rubbish Lister
 
PerlMonks  

XML parsing and Lists

by madbee (Acolyte)
on Jul 04, 2013 at 23:31 UTC ( #1042546=perlquestion: print w/ replies, xml ) Need Help??
madbee has asked for the wisdom of the Perl Monks concerning the following question:

Hello! I'm trying to parse an XML file, sample of which is below.

<Aritcle> <Main> <Sect> <H4>Include</H4> ..... <P1> This is the criteria</P1> <L> <LI> <LI_Label>1.</LI_Label> <LI_Title>Critera 1</LI_Title> </LI> <LI> <LI_Label>2.</LI_Label> <LI_Title>Critera 2</LI_Title> </LI> <LI> <LI_Label>3.</LI_Label> <LI_Title>Critera 3</LI_Title> </LI> <LI> <LI_Label>4.</LI_Label> <LI_Title>Critera 3</LI_Title> </LI> </Sect> </Main> </Article>

I am using: XML::LibXML for parsing this. I can easily get the entire section of lists. However, my problem is I just need to know how many LIST elements are there. One way I was thinking is:

1. Parse the xml for the sec4. store contents in array

2. Loop through array until I get the first LIST element. Track counter;

3. Increment counter for each LIST element found until the last LIST element is reached.

While this approach may work, I feel this is very kludgy. So I am looking to see if there is an elegant way I can count the number of list elements in a section

The challenge ofcourse is that not every XML file I am parsing has the exact same structure. There could be variations where the <P1> This is the criteria</P1> may not exist before the start of the list.

Hoping someone here has some thoughts on how best I can capture the count of list elements

Sadly, I cannot use XPath here since I have an entire piece of code built around LibXML parsing. Also I'm having an Xpath installation nightmare which I just cannot get past.

Thanks so much in advance.

Regards,Madbee

Comment on XML parsing and Lists
Download Code
Re: XML parsing and Lists
by choroba (Abbot) on Jul 04, 2013 at 23:38 UTC
    Using XML::LibXML without XPath (that is, XML::LibXML::XPathContext is like driving a Jaguar without tyres. Just compare your description to the succinctness of
    count(Sect[4]//LI)
    لսႽ ᥲᥒ⚪⟊Ⴙᘓᖇ Ꮅᘓᖇ⎱ Ⴙᥲ𝇋ƙᘓᖇ

      Thanks for responding.I was not aware of XPathContext module in LibXML. I was trying to use XML::XPath directly and got into an infinite loop of installation issues which I could not get past.

      I will try using this approach. Basically, I have to create a array of nodes for the path:

      $parser = XML::LibXML->new; $dom = $parser->parse_file($file); $root = $dom->getDocumentElement; $dom->setDocumentElement($root); my $xc = XML::LibXML::XPathContext->new($file); my @nodes=$xc->findnodes('//Article//Part//Sect//H5[ contains(.,"I +nclude")]',$dom); if (@nodes) { $count = $xc->findvalue('count(//Article//Part//Sect//LI)',$dom); print $count; }

      Am I on the right track? Anything I'm missing?

      Thanks much.

        You are overcomplicating the problem. Do not use setDocumentElement, it creates a new root element. The constructor of XPathContext takes a context node as a parameter, not a file. This is a Short, Self Contained, Correct Example:
        #!/usr/bin/perl use warnings; use strict; use XML::LibXML; my $dom = XML::LibXML->load_xml(string => << '__XML__'); <Article> <!-- fixed typo --> <Main> <Sect> <H4>Include</H4> ..... <P1> This is the criteria</P1> <L> <LI> <LI_Label>1.</LI_Label> <LI_Title>Critera 1</LI_Title> </LI> <LI> <LI_Label>2.</LI_Label> <LI_Title>Critera 2</LI_Title> </LI> <LI> <LI_Label>3.</LI_Label> <LI_Title>Critera 3</LI_Title> </LI> <LI> <LI_Label>4.</LI_Label> <LI_Title>Critera 3</LI_Title> </LI> </L> <!-- fixed missing closing tag --> </Sect> </Main> </Article> __XML__ my $xc = XML::LibXML::XPathContext->new; my $count = $xc->findvalue('count(//Article//Sect//LI)', $dom); print "$count list nodes found.\n" if $count;
        لսႽ ᥲᥒ⚪⟊Ⴙᘓᖇ Ꮅᘓᖇ⎱ Ⴙᥲ𝇋ƙᘓᖇ

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://1042546]
Approved by kcott
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others studying the Monastery: (6)
As of 2014-10-01 00:39 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    How do you remember the number of days in each month?











    Results (386 votes), past polls