Beefy Boxes and Bandwidth Generously Provided by pair Networks
No such thing as a small change
 
PerlMonks  

Re: facing problem in parsing xml

by choroba (Abbot)
on Jul 29, 2013 at 07:26 UTC ( #1046786=note: print w/ replies, xml ) Need Help??


in reply to facing problem in parsing xml

There are several problems. The first one, using " to quote the string which itself contains double quotes, has already been pointed out.

There are more problems, though:

  1. Your XML uses namespaces. When working with XML::LibXML (or any other XML library that supports XPath, even in langugages others than Perl), you have to register the namespaces in order to be able to reference namespaced elements in XPath expressions.
  2. Even if you add namespaces, you also have to make sure your XPath expressions really describe the structure of the document. In this case, record is not the root node, so you cannot start the expression with /record. Similarly, recordData does not contain a title child, there is a srw_dc:dc in between.

After fixing the mentioned problems, there is a code that works for me:

#!/usr/bin/perl use warnings; use strict; use XML::LibXML; my $xml = q%<?xml version="1.0"?> <?xml-stylesheet type="text/xsl" href="/templates/xsl/abc/search/searc +hRetrieveResponse.xsl"?><searchRetrieveResponse xmlns="http://www.abc +/srw/"> <version>1.1</version> <numberOfRecords>14135</numberOfRecords> <records> <record> <recordSchema>info:srw/schema/1/dc-v1.1</recordSchema> <recordPacking>xml</recordPacking> <recordData> <srw_dc:dc xmlns:srw_dc="info:srw/schema/1/dc-v1.1" xmlns:dc="http://purl.org/dc/elements/1.1/"> <dc:identifier>ISSN: 00322032</dc:identifier> <dc:identifier>URL: http://www.jstor.org/stable/2306831 +5</dc:identifier> <dc:title>TEST</dc:title> <dc:creator>KAY RYAN</dc:creator> <dc:relation>Poetry, Vol. 176, No. 3</dc:relation> <dc:coverage>p. 126</dc:coverage> <dc:rights>Copyright 2000 Poetry Foundation</dc:rights> <dc:publisher>Poetry Foundation</dc:publisher> <dc:date>2000-06-01</dc:date> <dc:type>FLA</dc:type> <dc:language>eng</dc:language> </srw_dc:dc> </recordData> <recordPosition>1</recordPosition> </record> <record> <recordSchema>info:srw/schema/1/dc-v1.1</recordSchema> <recordPacking>xml</recordPacking> <recordData> <srw_dc:dc xmlns:srw_dc="info:srw/schema/1/dc-v1.1" xmlns:dc="http://purl.org/dc/elements/1.1/"> <dc:identifier>ISSN: 0010096X</dc:identifier> <dc:identifier>URL: http://www.jstor.org/stable/357303< +/dc:identifier> <dc:title>Test</dc:title> <dc:creator>Wm. Leonard</dc:creator> <dc:relation>College Composition and Communication, Vol +. 29, No. 2</dc:relation> <dc:coverage>p. 161</dc:coverage> <dc:rights>Copyright 1978 National Council of Teachers +of English</dc:rights> <dc:publisher>National Council of Teachers of English</ +dc:publisher> <dc:date>1978-05-01</dc:date> <dc:type>FLA</dc:type> <dc:language>eng</dc:language> </srw_dc:dc> </recordData> <recordPosition>2</recordPosition> </record> </records> </searchRetrieveResponse>%; my $data = 'XML::LibXML'->load_xml(string => $xml); my $xpc = 'XML::LibXML::XPathContext'->new($data); $xpc->registerNs('srw', 'http://www.abc/srw/'); $xpc->registerNs('dc', 'http://purl.org/dc/elements/1.1/'); my $recordData = $xpc->findnodes('//srw:records/srw:record/srw:recordD +ata', $data); foreach my $rec(@$recordData){ my $title = $xpc->findnodes('.//dc:title', $rec); print $title, "\n"; }

Update: Also note that $title->string_value is not needed if all you want to do with the title is to print it. Elements stringify to their string value.

لսႽ ᥲᥒ⚪⟊Ⴙᘓᖇ Ꮅᘓᖇ⎱ Ⴙᥲ𝇋ƙᘓᖇ


Comment on Re: facing problem in parsing xml
Select or Download Code

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://1046786]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others surveying the Monastery: (7)
As of 2014-10-01 07:39 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    How do you remember the number of days in each month?











    Results (389 votes), past polls