Beefy Boxes and Bandwidth Generously Provided by pair Networks
Just another Perl shrine
 
PerlMonks  

Australian weather with Web::Scraper

by missingthepoint (Friar)
on Oct 07, 2008 at 13:52 UTC ( #715788=sourcecode: print w/replies, xml ) Need Help??
Category: Web Stuff
Author/Contact Info missingthepoint/ben petering
Description: Grab tomorrow's forecast from the Australian Bureau of Meteorology website using Web::Scraper
#!/usr/bin/perl -w
use strict;

use URI;
use Web::Scraper;

my $cities = scraper {
    process "td > a", city => 'TEXT';
    process "td.alignright", temperature => 'TEXT';
    process "td.alignright + td", comments => 'TEXT';
};

my $bom = scraper {
    process "#pad tr", 'cities[]' => $cities;
    result 'cities';
};

my $res = $bom->scrape( URI->new("http://www.bom.gov.au") );

die "god bless america\n" unless ref $res eq "ARRAY";

print "Tomorrow's forecast:\n\n";

# we need a way to distinguish tomorrow's forecast and today's...
# tomorrow's temperatures come first, and these are all we want,
# so we reverse and uniq. hack hack
my %tmp;
for (reverse @$res) {
    next unless $_->{city};
    $_->{temperature} =~ s/\D//g;
    $tmp{$_->{city}} = {
        temperature => $_->{temperature},
        comments => $_->{comments}
    };
}

for (sort keys %tmp) {
    print $_, " " x (18-length),
        $tmp{$_}->{temperature}, " " x (18-length($tmp{$_}->{temperatu
+re})),
        $tmp{$_}->{comments}, "\n"; 
}
Replies are listed 'Best First'.
Re: Australian weather with Web::Scraper
by smiffy (Pilgrim) on Oct 08, 2008 at 01:22 UTC

    Most of the BOM forecasts are available as plain text files via the FTP service - see here.

    I find these text files much easier to parse and they are less subject to layout changes from the web version. (In fact, I have never seen the text format change since I started using them.)

    As always, TIMTOWTDI ;-)

      Ah, cheers. I wondered if something like that was available. I'll probably hack up a version that uses those... The goal was really to make something useful with Web::Scraper, and I think I achieved that. :) But for reliability I'd use the text files too.


      email: perl -e 'print reverse map { chr( ord($_)-1 ) } split //, "\x0bufo/hojsfufqAofc";'
Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: sourcecode [id://715788]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others romping around the Monastery: (4)
As of 2019-12-08 23:08 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found

    Notices?