Beefy Boxes and Bandwidth Generously Provided by pair Networks
No such thing as a small change

comment on

( #3333=superdoc: print w/replies, xml ) Need Help??
It might be worth considering separating out extracting the data from the HTML. It could make the flow of the logic a bit easier. I would also recommend using a parser rather than regexes which can get a bit tricky on HTML.

I was unable to find a page on the website that corresponded to your regexes so I have taken a guess at what it might look like. If you could post a link to an actual page you're dealing with we might have more to go on. For instance, this uses HTML::TokeParser::Simple to do a single pass examining every token and extracting data as appropriate (it covers similar ground to the regex in your _extract_details method).

If the page is well structured it may be more appropriate to consider something like HTML::TreeBuilder which is more powerful and could simplify proceedings greatly.

#! /usr/bin/perl use strict; use warnings; use Data::Dumper; { package Meteoalarm::Parser; use HTML::TokeParser::Simple; sub new { my $class = shift; my $content = shift; my $p = HTML::TokeParser::Simple->new(string => $content); my $self = { parser => $p, }; bless($self, $class); return $self; } sub parse { my $self = shift; my (%data, $txt); my $t = $self->find_img() or return; $txt = $self->get_div_txt(q{info}); ($data{from}, $data{until}) = $txt =~ /^valid from (.*)Until(.*)$/ +; $txt = $self->get_div_txt(q{info}); ($data{type}, $data{level}) = $txt =~ /^(.*)Awareness Level: (.*)$ +/; $self->{data} = \%data; return 1; } sub find_img{ my $self = shift; my $p = $self->{parser}; while (my $t = $p->get_token){ return $t if $t->is_start_tag(q{img}); } return; } sub get_div_txt{ my $self = shift; my $div_class = shift; my $p = $self->{parser}; my $txt; while (my $t = $p->get_token){ if ( $t->is_start_tag(q{div}) and $t->get_attr(q{class}) and $t->get_attr(q{class}) eq $div_class ){ $p->get_token; $txt = $p->get_phrase; return $txt; } } return; } sub get_data{ my $self = shift; return $self->{data}; } } # script my $content = do{local $/; <DATA>}; my $mp = Meteoalarm::Parser->new($content); while ($mp->parse){ my $data = $mp->get_data; print Dumper $data; } __DATA__ <img src="my.jpeg"> <!-- possible stuff --> <div class="info"> <b>valid from</b> from date 1 <b>Until</b> until date 1 </div> <div class="info"> <b>type 1</b> Awareness Level: <b>awareness level 1</b> </div> <div class="text"> text </div> <!-- possible stuff --> <img src="my_other.jpeg"> <!-- possible stuff --> <div class="info"> <b>valid from</b> from date 2 <b>Until</b> until date 2 </div> <div class="info"> <b>type 2</b> Awareness Level: <b>awareness level 2</b> </div> <div class="text"> text </div> <!-- and so on -->
$VAR1 = { 'level' => 'awareness level 1', 'until' => ' until date 1', 'from' => 'from date 1 ', 'type' => 'type 1 ' }; $VAR1 = { 'level' => 'awareness level 2', 'until' => ' until date 2', 'from' => 'from date 2 ', 'type' => 'type 2 ' };
update: added output

In reply to Re: Weather warnings from by wfsp
in thread Weather warnings from by walto

Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post; it's "PerlMonks-approved HTML":

  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.
  • Log In?

    What's my password?
    Create A New User
    and the web crawler heard nothing...

    How do I use this? | Other CB clients
    Other Users?
    Others making s'mores by the fire in the courtyard of the Monastery: (7)
    As of 2019-10-21 23:45 GMT
    Find Nodes?
      Voting Booth?