Beefy Boxes and Bandwidth Generously Provided by pair Networks
XP is just a number
 
PerlMonks  

get() from website hangs

by jeffw_00 (Novice)
on Sep 13, 2019 at 12:42 UTC ( #11106130=perlquestion: print w/replies, xml ) Need Help??

jeffw_00 has asked for the wisdom of the Perl Monks concerning the following question:

Hi - the relevant lines of my script are

------- use LWP::Simple; $pass=1; my $doc = get('http://forecast.weather.gov/MapClick.php?CityName=Sudbu +ry&state=MA&site=BOX&lat=42.3667&lon=-71.4') || die {$pass=0}; -------

This is part of a script that runs every 15 minutes that gets this webpage, parses some specific info from it (like temperature), and writes the info to a file so my home control program can later open the file and act on the data. My problem is that a few times a day, the get never returns, and the script hangs. I have another script running that detects the hang, and kills and restarts the script. But that's clunky. I'd like to either eliminate the hang, or detect the hang -within- the script. But that is a level of Perl that I haven't dealt with before so I'm wondering if anyone can point me to a recipe. Thanks!

Further background - using ActivePerl on a W7 machine this code never hung. Switched to W10 and ActivePerl stopped supporting the get properly so I switched to Strawberry Perl (which i like better overall) but where I am having this intermittent issue. (this is probably the "fanciest" thing I do with Perl on my PC so for everything else either Perl works fine).

Many Thanks!
/j

Replies are listed 'Best First'.
Re: get() from website hangs
by Corion (Pope) on Sep 13, 2019 at 12:52 UTC

    The weather.gov website has an API where you should be able to get all that information without scraping HTML. Also see https://forecast-v3.weather.gov/documentation. I use this in Weather::WeatherGov (unreleased) to fetch weather data as JSON for a given latitude/longitude

    use strict; use HTTP::Tiny; use URI; use JSON 'decode_json'; use Moo 2; use feature 'signatures'; no warnings 'experimental::signatures'; our $base_uri = URI->new( $ENV{PERL_WEATHER_WEATHERGOV_URI} #|| 'https://forecast-v3.weather.gov/' || 'https://api.weather.gov/points/' ); my $ua = HTTP::Tiny->new( agent => "Weather::WeatherGov/$VERSION", ); sub forecast( %options ) { my $entry = base_uri . sprintf '%s,%s', $options{latitude}, $options{longitude}; # We should cache this request for office and grid position # maybe store these in the same SQLite database(schema) as Weather +::MOSMIX?! # also, we need a cache and rate-limiter here so we can rate-limit # at least by IP address and maybe also globally how much we hit w +eather.gov my $loc = $self->json_request($entry); json_request( $loc->{properties}->{forecastHourly}, ); } sub json_request( $uri ) { my $response = $ua->request(GET => $uri, { accept => 'JSON-LD', } ); decode_json( $response->{content} ); }
Re: get() from website hangs
by Fletch (Chancellor) on Sep 13, 2019 at 12:50 UTC

    Not your original question, but have you considered using their API rather than scraping?

    (As for the original question, look at the docs for alarm or Sys::SigAction's timeout_call and wrap your hanging call in something like that.)

    The cake is a lie.
    The cake is a lie.
    The cake is a lie.

Re: get() from website hangs
by bliako (Vicar) on Sep 17, 2019 at 21:41 UTC

    If the problem is a slow server then just tell LWP::Simple to tell LWP::UserAgent to timeout - (no need to alarm() yourself):

    use LWP::Simple qw($ua get); # export the underlying LWP::UserAgent in +to $ua ... $ua->timeout(10); # ... and set a timeout on it (seconds) my $doc = get('...') || die {$pass=0};

    bw, bliako

Re: get() from website hangs
by jeffw_00 (Novice) on Sep 13, 2019 at 15:48 UTC
    Thanks guys - some of the answers are over my head - but I'm giving alarm a try
      I offer you this bit of code you can play with (try it with parameter 2 and then with parameter 4):

      #!/usr/bin/perl use strict; use warnings; my $delay = shift @ARGV || 1; # parameter my $timeout = 3; # seconds eval { ## Turn on timer. local $SIG{"__DIE__"}; local $SIG{CHLD}; local $SIG{ALRM} = sub { die "timed-out\n" }; alarm $timeout; my $verylongprocess = `sleep $delay`; alarm 0; # disable alarm }; die $@ if $@ eq "timed-out\n"; print "No timeout\n";

      it should cover external processes (hence, the 3 local SIG's). Not sure it works on Windows. But give it a try.

      edit 2: Yes, it works, but instead of the unix "sleep" use the Windows "timeout" command (available in W7 and W10), like so:

      my $verylongprocess = `timeout $delay`;
Re: get() from website hangs
by harangzsolt33 (Pilgrim) on Sep 14, 2019 at 01:26 UTC
    I have a similar problem, except my perl script has NEVER been able to get anything at all. I have tried LWP::Simple, HTTP::Tiny, wget, and curl. None of them worked, not even once! I created a sample text file on another site and tried to fetch it using wget. And all it would do is say that it is downloading it, but then nothing got downloaded and all it gave me was an empty file. Every. Single. Time. HTTP::Tiny gave me an error message saying that it failed. ???

    Since I am using a free perl hosting called Zettahost, I am wondering if it is possible maybe they do not let free users fetch web pages, or maybe it's a security issue and they don't want anyone to do that. I don't know. Maybe fetching web pages is a privilege that has not been granted to me. Is that possible? They use a Debian Linux server.

    I tried to fetch website data using PHP, and it failed as well. So, there's got to be something with the server or the way it is configured. It's not letting me download anything!

    <?php $fileHTML = file_get_contents('http://www.wzsn.net/'); $title = substr($fileHTML, strpos($fileHTML,'<TITLE>') + 7, strpos($fi +leHTML,'</TITLE>') - (strpos($fileHTML,'<TITLE>') + 7)); echo $title; ?>

    It says, "Warning: file_get_contents(http://www.wzsn.net/): failed to open stream: Connection refused in /srv/disk13/2468845/www/ildispaintings.com/xget.php on line 7"

      Yes it looks like zettahost does not allow outbound network connections by default. I just typed 'Zettahost outbound network connections' in to google and got:

      http://zettahost.runhosting.com/faq.html

      15. I cannot connect to remote scripts, RSS feeds or use cURL on my website?

      The outgoing connections are disabled by default on all accounts for security reasons, however they can be enabled for paid accounts from Hosting Settings section. So all you need to do is go to your panel Hosting Settings section look for "Firewall Options" and click on the "Enable" button.

      Free Web hosts generally restrict functions like that to avoid becoming havens for spammers and other Internet ne'er-do-wells. The reason that it is an option for paid accounts is probably because a paid account has a paper trail that (theoretically) gives the host someone to hold accountable for abuse.

      Crooks, of course, simply pay with stolen credit cards and move on to the next service when the payments bounce and the account is closed, but usually go straight for the "big" providers that are less likely to be able to keep close eyes on new accounts instead of abusing a paid tier on a free hosting service. At $WORK, I once had to watch the logs extra-closely around @DAYS of the month. That was when our worst abuse bots changed hosting providers like clockwork for several months in a row. Sending complaints was useless, because by the time the other provider acted on a complaint of a password-guessing bot, the account had usually been closed for using stolen payment details, or at least that was what the other abuse departments told us. I guess the miscreants eventually found easier targets or ran out of providers to abuse, but that wave lasted almost a year.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://11106130]
Approved by davies
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others examining the Monastery: (5)
As of 2019-10-18 11:00 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?
    Notices?