Beefy Boxes and Bandwidth Generously Provided by pair Networks
Keep It Simple, Stupid

Re^2: Weather warnings from

by walto (Pilgrim)
on May 27, 2010 at 20:23 UTC ( #841961=note: print w/replies, xml ) Need Help??

in reply to Re: Weather warnings from
in thread Weather warnings from

I never got a handle on the HTML::TokeParser module so I try to get the data with regexp. But you are right: this is not to proper way to do it.
To get the warnings of all countries the script evaluates
I changed your
sub find_img{ my @images; my $content = shift; my $p = HTML::TokeParser::Simple->new(string => $content); while (my $t = $p->get_token){ push @images, $t if $t->is_start_tag(q{img}); } return \@images; }
because there are many images on the page. So again I would need some routine (regexp) to filter out unwanted images.

Replies are listed 'Best First'.
Re^3: Weather warnings from
by wfsp (Abbot) on May 28, 2010 at 08:50 UTC
    Ok, from looking at the link we can simplify things enourmously.

    The data we are after are in cells with class col1 or col2. We can loop over those and extract what we need. You will need to tweak as appropriate but hopefully it will give you the idea.

    #! /usr/bin/perl use strict; use warnings; use Data::Dumper; # meteoalarm.html is the source from the website open my $fh, q{<}, q{meteoalarm.html} or die qq{cant open file to read: $!\n}; my $content = do{local $/; <$fh>}; my $mp = Meteoalarm::Parser->new($content); my $data = $mp->parse; print Dumper $data; package Meteoalarm::Parser; use HTML::TreeBuilder; use Data::Dumper; sub new { my $class = shift; my $content = shift; my $p = HTML::TreeBuilder->new_from_content($content); my $self = { parser => $p, }; bless($self, $class); return $self; } sub parse { my $self = shift; my $p = $self->{parser}; my (%data); my @cells = $p->look_down(_tag => q{td}, class => qr/^col[12]$/); for my $cell (@cells){ my $div = $cell->look_down(_tag => q{div}); my $id = $div->id; my $alt = $div->attr(q{alt}); my $img = $div->look_down(_tag => q{img}); my $src = $img?$img->attr(q{src}):q{}; $data{$id}{fullname} = $alt; $data{$id}{warning} = $src; } return \%data; }
    output (extract)
    $VAR1 = { 'UK' => { 'warning' => '', 'fullname' => 'United Kingdom' }, 'CY' => { 'warning' => '', 'fullname' => 'Cyprus' }, 'IE' => { 'warning' => 'Bilder/wf/wf_23.jpg', 'fullname' => 'Ireland' }, 'IS' => { 'warning' => '', 'fullname' => 'Iceland' }, 'NL' => { 'warning' => '', 'fullname' => 'Netherlands' }, 'BE' => { 'warning' => '', 'fullname' => 'Belgium' }, 'AT' => { 'warning' => 'Bilder/wf/wf_23.jpg', 'fullname' => 'Austria' }, };
      Thanks wfsp for your very helpful posts. With your advice I was able to change the script. I updated the original code with the new one.
      The html for country and region warnings differ slightly. So I kept the original structure with different methods in subs.

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://841961]
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others browsing the Monastery: (4)
As of 2021-01-22 03:52 GMT
Find Nodes?
    Voting Booth?