Here is a simplified version of your script which uses XML::FeedPP to get the XML, XML::Rules to extract the data and HTML::TreeBuilder::XPath to extract the times from the summary. I have also included a simple regex to extract the times should you not be able to install the XPath module. Adapt as you require.
#!/usr/bin/perl -w
use strict;
use warnings;
use XML::FeedPP;
use XML::Rules;
use HTML::TreeBuilder::XPath;
# input
my $source = 'http://earthquake.usgs.gov/earthquakes/feed/v1.0/summary
+/4.5_day.atom';
my $atom_xml = XML::FeedPP::Atom->new( $source )->to_string();
# output
my $outfile = "quake.txt";
open my $fh,'>',$outfile or die "$!";
# parser
my @rules = (
_default => 'content',
title => \&title,
entry => \&report_item,
);
my $parser = XML::Rules->new(rules => \@rules);
# process
report_header();
$parser->parse( $atom_xml );
close $fh;
sub title {
my $title = $_[1]->{'_content'};
'magnitude' => substr($title,2,3),
'place' => substr($title,8);
}
sub report_item {
my $summary = $_[1]->{summary};
# extract time from summary using XPath
my $tree = HTML::TreeBuilder::XPath->new_from_content($summary);
my @dd = $tree->findvalues('//dd');
# extract time using regex
my $t1;
my $t2;
if ($summary =~ m!<dt>Time</dt>
<dd>(.*)\ UTC</dd>
<dd>(.*)\ at\ epicenter</dd>!x){
$t1 = $1;
$t2 = $2;
}
print $fh <<EOF
Place : $_[1]->{place}
Magnitude : $_[1]->{magnitude}
Updated : $_[1]->{updated}
Location : $_[1]->{'georss:point'}
Time Xpath: $dd[0]
$dd[1]
Time regex: $t1
: $t2
Summary : $summary
EOF
}
sub report_header {
my $cur_time = localtime;
print $fh <<EOF
# This Quake file created by quake_parsing_9
# Matt Coblentz; Perl version unknown
# For more information, see the USGS website
# Last Updated: $cur_time
EOF
}
HTH
poj
-
Are you posting in the right place? Check out Where do I post X? to know for sure.
-
Posts may use any of the Perl Monks Approved HTML tags. Currently these include the following:
<code> <a> <b> <big>
<blockquote> <br /> <dd>
<dl> <dt> <em> <font>
<h1> <h2> <h3> <h4>
<h5> <h6> <hr /> <i>
<li> <nbsp> <ol> <p>
<small> <strike> <strong>
<sub> <sup> <table>
<td> <th> <tr> <tt>
<u> <ul>
-
Snippets of code should be wrapped in
<code> tags not
<pre> tags. In fact, <pre>
tags should generally be avoided. If they must
be used, extreme care should be
taken to ensure that their contents do not
have long lines (<70 chars), in order to prevent
horizontal scrolling (and possible janitor
intervention).
-
Want more info? How to link
or How to display code and escape characters
are good places to start.
|