Beefy Boxes and Bandwidth Generously Provided by pair Networks
Keep It Simple, Stupid
 
PerlMonks  

Get data from an XML file heading

by nachtmsk (Novice)
on Feb 20, 2020 at 14:43 UTC ( #11113248=perlquestion: print w/replies, xml ) Need Help??

nachtmsk has asked for the wisdom of the Perl Monks concerning the following question:

Hi.

So I have an XML file that I am parsing using XML::LibXML and it's working very nicely.

The provider of the data is only returning 250 records initially. In the heading of the XML data is a section that tells how many records are in total, pages available, etc... See code below.

It doesn't look like XML to me. Anyone know if if I can get XML::LibXML to get this data out for me. Particularly what I need is "total_pages", "total_records" and "download_key".

If not, I guess a RegEx would be the way to go. Working on that now, but if anyone has an elegant suggestion, I'm listening.

<result_summary total_records="594" total_pages="3" current_page="1" +records_this_page="250" download_key="xmxnxnxnxnxnxnxnxnx" time_start +="2020-02-19 15:50:55" feed_version="1.44" />
Thanks, Mike

Replies are listed 'Best First'.
Re: Get data from an XML file heading
by haukex (Chancellor) on Feb 20, 2020 at 15:33 UTC
    It doesn't look like XML to me.

    It does to me, unless you're not showing everything.

    If not, I guess a RegEx would be the way to go.

    No, please don't!! Parsing HTML/XML with Regular Expressions

    use warnings; use strict; use XML::LibXML; my $xml = <<'END_XML'; <root> <result_summary total_records="594" total_pages="3" current_page="1" records_this_page="250" download_key="xmxnxnxnxnxnxnxnxnx" time_start="2020-02-19 15:50:55" feed_version="1.44" /> </root> END_XML my $dom = XML::LibXML->load_xml(string => $xml); my @nodes = $dom->findnodes('//result_summary'); die "Failed to find exactly one result_summary node" unless @nodes==1; my %attrs = %{ $nodes[0] }; use Data::Dumper; print Dumper(\%attrs); __END__ $VAR1 = { 'total_records' => '594', 'feed_version' => '1.44', 'time_start' => '2020-02-19 15:50:55', 'download_key' => 'xmxnxnxnxnxnxnxnxnx', 'total_pages' => '3', 'records_this_page' => '250', 'current_page' => '1' };
      Thanks. I didn't think it looked like XML Because it didn't have an ending and it has multiple values for one, node, I guess you would call it.

      I thought XML looks more like below...

      <myId>757</myId> <address> 445 smith</address>

      The data I get at the top of the file has one node name "result_summary" no ending tag/node and multiple values within the node.

      Thanks for the solution though.
        I didn't think it looked like XML Because it didn't have an ending and it has multiple values for one, node, I guess you would call it.

        It's known as an "empty-element" tag, <element /> is equivalent to <element></element>, and the values within the tag itself are attributes. See e.g. https://www.w3schools.com/xml/.

Re: Get data from an XML file heading
by Lotus1 (Vicar) on Feb 20, 2020 at 16:20 UTC

    Check out some of the XML tutorials online for explanations of attributes. Here are some examples of accessing attributes.

    use warnings; use strict; use XML::LibXML; ### https://metacpan.org/pod/distribution/XML-LibXML/LibXML.pod ### https://grantm.github.io/perl-libxml-by-example/basics.html ### https://metacpan.org/pod/distribution/XML-LibXML/lib/XML/LibXML/El +ement.pod ### https://metacpan.org/pod/distribution/XML-LibXML/lib/XML/LibXML/No +de.pod my $dom = XML::LibXML->load_xml(string => <<'EOT'); <test> <result_summary total_records="594" total_pages="3" current_page="1" + records_this_page="250" download_key="xmxnxnxnxnxnxnxnxnx" time_star +t="2020-02-19 15:50:55" feed_version="1.44" /> </test> EOT ### get individual attribute by name: print "\n******* get individual attribute by name: ***********\n"; foreach my $attribute( $dom->findnodes('/test/result_summary/@total_pa +ges') ) { print "total_pages = ", $attribute->to_literal, "\n"; } ### get individual as hash refs: print "\n******* get individual attribute as hash refs: ***********\n" +; foreach my $result_sum ( $dom->findnodes('/test/result_summary') ) { print "total_pages = ", $result_sum->{total_pages}, "\n"; } ### get all attributes: print "\n******* get all attributes: ***********\n"; foreach my $attribute( $dom->findnodes('/test/result_summary/@*') ) { print $attribute->nodeName, " = ", $attribute->to_literal, "\n"; }

    Output:

    ******* get individual attribute by name: *********** total_pages = 3 ******* get individual attribute as hash refs: *********** total_pages = 3 ******* get all attributes: *********** total_records = 594 total_pages = 3 current_page = 1 records_this_page = 250 download_key = xmxnxnxnxnxnxnxnxnx time_start = 2020-02-19 15:50:55 feed_version = 1.44

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://11113248]
Approved by Corion
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others contemplating the Monastery: (9)
As of 2020-03-30 14:26 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?
    To "Disagree to disagree" means to:









    Results (175 votes). Check out past polls.

    Notices?