Beefy Boxes and Bandwidth Generously Provided by pair Networks
No such thing as a small change
 
PerlMonks  

perl xpath extraction

by lionofmasses (Initiate)
on Apr 11, 2011 at 02:27 UTC ( [id://898642]=perlquestion: print w/replies, xml ) Need Help??

lionofmasses has asked for the wisdom of the Perl Monks concerning the following question:

hi all , iam just learning the basics of PERL and XPATH. can anyone show me.... how can i extract the editors name or wadevva information i want by specyfyin XPATH statements...... from the following XML document which is TEI encoded

<?xml version="1.0" encoding="UTF-8"?> <!DOCTYPE TEI.2 SYSTEM "tei-epidoc.dtd" > <TEI.2 lang="eng" id="eAla001"> <teiHeader status="new" type="text"> <fileDesc> <titleStmt> <title type="main" level="m">1. Letter of Valerian and Gallien +us</title> <editor role="editor">Joyce M. Reynolds</editor> </titleStmt> <publicationStmt> <date>2004</date> </publicationStmt> <sourceDesc default="NO"> <p><!-- to be added --></p> </sourceDesc> </fileDesc> <profileDesc> <langUsage default="NO"> <language id="eng">English</language> <language id="fre">French</language> <language id="ger">German</language> <language id="grc">Ancient Greek</language> <language id="gre">Modern Greek</language> <language id="ita">Italian</language> <language id="lat">Latin</language> <language id="spa">Spanish</language> <language id="tur">Turkish</language> </langUsage> </profileDesc> </body> </text> </TEI.2>

Replies are listed 'Best First'.
Re: perl xpath extraction
by wind (Priest) on Apr 11, 2011 at 04:54 UTC

    Check out XML::Twig:

    use XML::Twig; my $t= XML::Twig->new( twig_handlers => { 'editor[@role="editor"]' => sub { print $_->text() }, }); $t->parsefile( 'yourfile.xml' );
      I would agree with the suggestion to use XML::Twig. I use Perl to process XML data all the time and have found XML::Twig indispensable.
Re: perl xpath extraction
by Anonymous Monk on Apr 11, 2011 at 04:14 UTC
Re: perl xpath extraction
by Gulliver (Monk) on Apr 11, 2011 at 15:36 UTC
Re: perl xpath extraction
by choroba (Cardinal) on Apr 12, 2011 at 15:00 UTC
Re: perl xpath extraction
by lionofmasses (Initiate) on Apr 16, 2011 at 04:02 UTC
    hi,

    i tried extracting the "editor" for the specified xml file using the XPath tutorials as suggested.Im supposed to do it with XPATH only . cannot use TWIG. the xml file is

    <?xml version="1.0" encoding="UTF-8"?> <!DOCTYPE TEI.2 SYSTEM "tei-epidoc.dtd" > <TEI.2 lang="eng" id="eAla001"> <teiHeader status="new" type="text"> <fileDesc> <titleStmt> <title type="main" level="m">1. Letter of Valerian and Gallien +us</title> <editor role="editor">Joyce M. Reynolds</editor> </titleStmt> <publicationStmt> <date>2004</date> </publicationStmt> <sourceDesc default="NO"> <p><!-- to be added --></p> </sourceDesc> </fileDesc> <profileDesc> <langUsage default="NO"> <language id="eng">English</language> <language id="fre">French</language> <language id="ger">German</language> <language id="grc">Ancient Greek</language> <language id="gre">Modern Greek</language> <language id="ita">Italian</language> <language id="lat">Latin</language> <language id="spa">Spanish</language> <language id="tur">Turkish</language> </langUsage> </profileDesc> </body> </text> </TEI.2>

    the perl xpath script i used is

    #!/usr/bin/perl use warnings; use strict; use XML::XPath; use XML::XPath::XMLParser; my $xp = XML::XPath->new(filename => 'E:\xmlfiles\eAla001.xml'); my $nodeset = $xp->find('//@editor role'); # find all editors foreach my $node ($nodeset->get_nodelist) { print XML::XPath::XMLParser::as_string($node), "\n\n"; }

    Im getting this error

    :
    501 Protocol scheme 'e' is not supported e:/tei-epidoc.dtd Handler couldn't resolve external entity at line 2, column 40, byte 80 error in processing external entity reference at line 2, column 40, by +te 80: <?xml version="1.0" encoding="UTF-8"?> <!DOCTYPE TEI.2 SYSTEM "tei-epidoc.dtd" > =======================================^ <TEI.2 lang="eng" id="eAla224"> <teiHeader status="new" type="text"> at C:/Perl/lib/XML/Parser.pm line 187

    please help me out :(

      You didn't provide the file "tei-epidoc.dtd". I tried your program with that line removed from the xml and it complained about mismatched tags at "</body>". The xml isn't valid.

      have a look at the Perlmonks tutorial I linked to above. The "moving up from simple to libXML" is pretty good and relevant to your problem. libXML has a faster and more up to date xpath implementation.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://898642]
Approved by Corion
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others sharing their wisdom with the Monastery: (5)
As of 2024-03-29 11:02 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found