Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl: the Markov chain saw

Re: facing problem in parsing xml

by choroba (Chancellor)
on Jul 29, 2013 at 07:26 UTC ( #1046786=note: print w/replies, xml ) Need Help??

in reply to facing problem in parsing xml

There are several problems. The first one, using " to quote the string which itself contains double quotes, has already been pointed out.

There are more problems, though:

  1. Your XML uses namespaces. When working with XML::LibXML (or any other XML library that supports XPath, even in langugages others than Perl), you have to register the namespaces in order to be able to reference namespaced elements in XPath expressions.
  2. Even if you add namespaces, you also have to make sure your XPath expressions really describe the structure of the document. In this case, record is not the root node, so you cannot start the expression with /record. Similarly, recordData does not contain a title child, there is a srw_dc:dc in between.

After fixing the mentioned problems, there is a code that works for me:

#!/usr/bin/perl use warnings; use strict; use XML::LibXML; my $xml = q%<?xml version="1.0"?> <?xml-stylesheet type="text/xsl" href="/templates/xsl/abc/search/searc +hRetrieveResponse.xsl"?><searchRetrieveResponse xmlns=" +/srw/"> <version>1.1</version> <numberOfRecords>14135</numberOfRecords> <records> <record> <recordSchema>info:srw/schema/1/dc-v1.1</recordSchema> <recordPacking>xml</recordPacking> <recordData> <srw_dc:dc xmlns:srw_dc="info:srw/schema/1/dc-v1.1" xmlns:dc=""> <dc:identifier>ISSN: 00322032</dc:identifier> <dc:identifier>URL: +5</dc:identifier> <dc:title>TEST</dc:title> <dc:creator>KAY RYAN</dc:creator> <dc:relation>Poetry, Vol. 176, No. 3</dc:relation> <dc:coverage>p. 126</dc:coverage> <dc:rights>Copyright 2000 Poetry Foundation</dc:rights> <dc:publisher>Poetry Foundation</dc:publisher> <dc:date>2000-06-01</dc:date> <dc:type>FLA</dc:type> <dc:language>eng</dc:language> </srw_dc:dc> </recordData> <recordPosition>1</recordPosition> </record> <record> <recordSchema>info:srw/schema/1/dc-v1.1</recordSchema> <recordPacking>xml</recordPacking> <recordData> <srw_dc:dc xmlns:srw_dc="info:srw/schema/1/dc-v1.1" xmlns:dc=""> <dc:identifier>ISSN: 0010096X</dc:identifier> <dc:identifier>URL:< +/dc:identifier> <dc:title>Test</dc:title> <dc:creator>Wm. Leonard</dc:creator> <dc:relation>College Composition and Communication, Vol +. 29, No. 2</dc:relation> <dc:coverage>p. 161</dc:coverage> <dc:rights>Copyright 1978 National Council of Teachers +of English</dc:rights> <dc:publisher>National Council of Teachers of English</ +dc:publisher> <dc:date>1978-05-01</dc:date> <dc:type>FLA</dc:type> <dc:language>eng</dc:language> </srw_dc:dc> </recordData> <recordPosition>2</recordPosition> </record> </records> </searchRetrieveResponse>%; my $data = 'XML::LibXML'->load_xml(string => $xml); my $xpc = 'XML::LibXML::XPathContext'->new($data); $xpc->registerNs('srw', ''); $xpc->registerNs('dc', ''); my $recordData = $xpc->findnodes('//srw:records/srw:record/srw:recordD +ata', $data); foreach my $rec(@$recordData){ my $title = $xpc->findnodes('.//dc:title', $rec); print $title, "\n"; }

Update: Also note that $title->string_value is not needed if all you want to do with the title is to print it. Elements stringify to their string value.

لսႽ ᥲᥒ⚪⟊Ⴙᘓᖇ Ꮅᘓᖇ⎱ Ⴙᥲ𝇋ƙᘓᖇ

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://1046786]
[erix]: in this context, one learns all kinds of interesting psychological excuse-jargon: not just 'alternative facts'; how about 'gapped knowledge' ? :)
[erix]: which still manages to sound more or less positive

How do I use this? | Other CB clients
Other Users?
Others having an uproarious good time at the Monastery: (9)
As of 2017-01-24 09:13 GMT
Find Nodes?
    Voting Booth?
    Do you watch meteor showers?

    Results (203 votes). Check out past polls.