Beefy Boxes and Bandwidth Generously Provided by pair Networks
Don't ask to ask, just ask
 
PerlMonks  

Detecting for HTTP pages code changes

by dotowwxo (Acolyte)
on Jan 15, 2018 at 02:30 UTC ( #1207253=perlquestion: print w/replies, xml ) Need Help??
dotowwxo has asked for the wisdom of the Perl Monks concerning the following question:

Hello Monks,

Currently, I have a script that curls for HTTP code (200,404 etc.) from a list of websites and check for HTTP code error 404, if the webpage returns HTTP code error 404, it will call another script to send out an email. However, the flaw in this script is that the script is constantly running 24/7, each cycle of the script is about 15 minutes. Every 15 minutes, if the website is still down (HTTP code 404), it will send out email. This means that I receive an email every 15 minutes if the webpage is still down. However, this is not what I want as I only want to receive 1 email when the webpage switches from a 200 webpage to a 404 webpage. Is there a way where I can enhance this and reduce the 404 error?

Due to confidential issues, I cannot disclose the script, however, this is a short example in my script that I used to check for HTTP code check on the script:
my $HTTPCode =`curl -s -w "%{http_code}" -o /dev/null https://$THIS_UR +L 2>&1`; #this is the line i used to retrieve the http_code if($HTTP Code == 404){ #Send email }
What I want to achieve is instead of checking for HTTP code error 404, is there a way to detect if there's a page change from a 200 webpage to 404 webpage, send email. And if the webpage is already in a 404 state, do not send email.. I know this is a very vague question because I cannot provide my script to all of you.. but any suggestion in theory is good too. Thank you in advance

Replies are listed 'Best First'.
Re: Detecting for HTTP pages code changes
by Athanasius (Bishop) on Jan 15, 2018 at 03:15 UTC

    Hello dotowwxo,

    This sounds like a job for the flip-flop operator:

    use strict; use warnings; use feature qw( state ); while (my $code = get_code()) { print "Send email re: http code $code\n" if $code != 404 .. $code +== 404; } sub get_code { state $codes = [ 200, 200, 404, 404, 404, 404, 200, 200, 404, 404, + 200, ]; return shift @$codes; }

    Output:

    13:10 >perl 1861_SoPW.pl Send email re: http code 200 Send email re: http code 200 Send email re: http code 404 Send email re: http code 200 Send email re: http code 200 Send email re: http code 404 Send email re: http code 200 13:10 >

    The above is just a proof of concept, but adapting it to your actual code should be a straightforward exercise. References:

    Hope that helps,

    Athanasius <°(((><contra mundum Iustus alius egestas vitae, eros Piratica,

Re: Detecting for HTTP pages code changes
by Marshall (Abbot) on Jan 15, 2018 at 09:38 UTC
    So, I am assuming that this a cron job that runs every 15 minutes?
    In order to suppress multiple 404 messages, you need to maintain a state of what happened before. The file system is fine for this.

    Here is some pseudo code:

    if($HTTPCode == SUCCESS) { delete 404Errfile if (-e 404Errfile); } elsif ($HTTPCcode == 404) { print 404_error unless exists 404ErrFile; create 404ErrFile; } else print $HTTPCcode;
Re: Detecting for HTTP pages code changes
by nysus (Vicar) on Jan 15, 2018 at 04:27 UTC

    Another option, which has the advantage that it will still work properly if the script is interrupted and restarted, is the storable module. This will allow you to easily save variables as a file and retrieve them later, like a very crude database. It might look something like this:

    use storable; my $results = retrieve('results_store'); sub check_site { my $domain = shift; my $current_result = check_domain($domain); my $last_result = $results->{$domain}; if (($last_result == 200 || !$last_result) && $current_result == 400 +) { send_email($domain); } if ($last_result != $current_result) { $results->{$domain} = $current_result; store $results, 'results_store'; } }

    $PM = "Perl Monk's";
    $MCF = "Most Clueless Friar Abbot Bishop Pontiff Deacon Curate Priest";
    $nysus = $PM . ' ' . $MCF;
    Click here if you love Perl Monks

      use storable;

      That should be use Storable;. The other line works only accientally.

      I would not use Storable, for a simple reason: It depends on the currently used perl version. Update or downgrade perl and you might get into trouble. There are many other options that don't have this problem. Most of them also allow switching to a different language, or to combine tools build from different languages.

      Alexander

      --
      Today I will gladly share my knowledge and experience, for there are no sweeter words than "I told you so". ;-)

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://1207253]
Approved by Athanasius
Front-paged by Corion
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others cooling their heels in the Monastery: (8)
As of 2018-11-21 20:07 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?
    My code is most likely broken because:
















    Results (250 votes). Check out past polls.

    Notices?