Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl Monk, Perl Meditation
 
PerlMonks  

Dylan - For the sites they are a-changin'

by jplindstrom (Monsignor)
on Apr 23, 2003 at 23:16 UTC ( #252709=CUFP: print w/ replies, xml ) Need Help??

Having started to follow a few blogs, I needed a reminder of when they change. RSS? I don't know, and I don't know how it works. And web services, like http://www.changedetection.com/, doing this don't check regularly enough.

So I wrote a script, obviously; Dylan - For the sites they are a-changin'

Note 1: It's Windows-biased in the way it starts the web browser when a page changes, but the action can easily be replaced with, say, sending a mail.

Note 2: Why doesn't HEAD work? Why doesn't the headers contain the document's change time? I was under the impression that it should.

#!/usr/local/bin/perl -w use strict; use LWP::Simple qw(head get); use MD5; print q{Dylan v0.0.1, 2003-03-24, johanl@bossmedia.se For the sites they are a-changin' }; main(); sub main { my $ext = ".snapshot"; my $daysLifetime = 14; my $secondsDelay = 60 * 10; my $commandShow = q{START %s}; my @aUrl = ( "http://www.sidhe.org/~dan/blog/", ); while(1) { print "\nChecking urls:\n" . join("\n", map { " $_" } @aUrl) +. "\n"; for my $url( aChangedUrls(\@aUrl, $ext, $daysLifetime) ) { my $command = sprintf($commandShow, $url); system($command) and warn("Could not show url ($url)\n"); } print "\nZZzzzzz for ($secondsDelay) seconds\n"; sleep($secondsDelay); } } sub aChangedUrls { my ($raUrl, $ext, $daysLifetime) = @_; my $isFirstTime = ! glob("*$ext"); #No files == first time my @aChanged; for my $url (@$raUrl) { my $text = get($url) or warn("Could not fetch ($url)\n"), next +; my $fileMd5 = MD5->hexhash($text) . $ext; if(! -r $fileMd5) { open(my $fh, "> $fileMd5") or warn("Could not create ($fil +eMd5)\n"), next; close($fh); push(@aChanged, $url); } } @aChanged = () if($isFirstTime); #It didn't "change" if it +was the first time #Clean up old stuff unlink($_) for ( grep { -M > $daysLifetime} glob("*$ext")); return(@aChanged); } __END__

http://www.bahnhof.se/~johanl/perl/Dylan/dylan.pl.txt

Comment on Dylan - For the sites they are a-changin'
Download Code
Re: Dylan - For the sites they are a-changin'
by crenz (Priest) on Apr 23, 2003 at 23:53 UTC

    Why doesn't HEAD work? Why doesn't the headers contain the document's change time? I was under the impression that it should.

    The date is in a HTTP header that is automatically created by the webserver for static pages. However, when you use a script to output a dynamic web page, you explicitly need to set the header yourself. I guess most blog scripts forget to do that.

    Note that in this case you are actually polling a blog that in a recent entry proposes a protocol that abolishes polling :).

      Note that in this case you are actually polling a blog that in a recent entry proposes a protocol that abolishes polling :).

      Yes, I know :)

      But I'm not religious on the topic of polling and only did it to get the work done, not to save the world and invent a new protocol. I leave that to others :)

Re: Dylan - For the sites they are a-changin'
by Aristotle (Chancellor) on Apr 25, 2003 at 13:20 UTC
    MD5 is way outdated. You should use Digest::MD5 instead. This makes it
    use Digest::MD5 qw(md5_hex); ... my $fileMd5 = md5_hex($text) . $ext;
    As a minor nit, I advise to use the three-argument form of open whenever possible (ie when you're not working with pipes; that is planned for a later version of Perl5 too though) for a variety of reasons. No, they don't really apply here, but it's good habit.
    open(my $fh, ">", $fileMd5) or warn("Could not create ($fileMd5)\n"), +next;
    And finally, for the Unixers out there using Mozilla, it can be adapted using the following code:
    if(system(mozilla => -remote => 'ping()')) { warn "mozilla not found!"; my $pid = fork; die "Couldn't fork: $!" if not defined $pid; exec { 'mozilla' } 'mozilla' if $pid == 0; } # and changing the command like so: # (obviously, use "new-window" if you prefer that) my $commandShow = q{mozilla -remote "openurl(%s, new-tab)"};
    For the Windows Mozillaites, that will be something like this (but don't quote me on it, since I have neither used Win32::Process nor any desire to expand my experience with that OS):
    Win32::Process::Create( my $process, "mozilla.exe", # maybe needs absolute path, dunno "", 0, NORMAL_PRIORITY_CLASS, "." ) || die ErrorReport() if system(mozilla => -remote => 'ping()');
    I also attached a patch for your downloadable script to make these changes (posted in a separate node to avoid confusion with the code blocks in this one).

    Makeshifts last the longest.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: CUFP [id://252709]
Approved by particle
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others about the Monastery: (8)
As of 2014-12-29 13:09 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    Is guessing a good strategy for surviving in the IT business?





    Results (187 votes), past polls