Beefy Boxes and Bandwidth Generously Provided by pair Networks
Don't ask to ask, just ask
 
PerlMonks  

Dylan - For the sites they are a-changin'

by jplindstrom (Monsignor)
on Apr 23, 2003 at 23:16 UTC ( #252709=CUFP: print w/ replies, xml ) Need Help??

Having started to follow a few blogs, I needed a reminder of when they change. RSS? I don't know, and I don't know how it works. And web services, like http://www.changedetection.com/, doing this don't check regularly enough.

So I wrote a script, obviously; Dylan - For the sites they are a-changin'

Note 1: It's Windows-biased in the way it starts the web browser when a page changes, but the action can easily be replaced with, say, sending a mail.

Note 2: Why doesn't HEAD work? Why doesn't the headers contain the document's change time? I was under the impression that it should.

#!/usr/local/bin/perl -w use strict; use LWP::Simple qw(head get); use MD5; print q{Dylan v0.0.1, 2003-03-24, johanl@bossmedia.se For the sites they are a-changin' }; main(); sub main { my $ext = ".snapshot"; my $daysLifetime = 14; my $secondsDelay = 60 * 10; my $commandShow = q{START %s}; my @aUrl = ( "http://www.sidhe.org/~dan/blog/", ); while(1) { print "\nChecking urls:\n" . join("\n", map { " $_" } @aUrl) +. "\n"; for my $url( aChangedUrls(\@aUrl, $ext, $daysLifetime) ) { my $command = sprintf($commandShow, $url); system($command) and warn("Could not show url ($url)\n"); } print "\nZZzzzzz for ($secondsDelay) seconds\n"; sleep($secondsDelay); } } sub aChangedUrls { my ($raUrl, $ext, $daysLifetime) = @_; my $isFirstTime = ! glob("*$ext"); #No files == first time my @aChanged; for my $url (@$raUrl) { my $text = get($url) or warn("Could not fetch ($url)\n"), next +; my $fileMd5 = MD5->hexhash($text) . $ext; if(! -r $fileMd5) { open(my $fh, "> $fileMd5") or warn("Could not create ($fil +eMd5)\n"), next; close($fh); push(@aChanged, $url); } } @aChanged = () if($isFirstTime); #It didn't "change" if it +was the first time #Clean up old stuff unlink($_) for ( grep { -M > $daysLifetime} glob("*$ext")); return(@aChanged); } __END__

http://www.bahnhof.se/~johanl/perl/Dylan/dylan.pl.txt

Comment on Dylan - For the sites they are a-changin'
Download Code
Replies are listed 'Best First'.
Re: Dylan - For the sites they are a-changin'
by crenz (Priest) on Apr 23, 2003 at 23:53 UTC

    Why doesn't HEAD work? Why doesn't the headers contain the document's change time? I was under the impression that it should.

    The date is in a HTTP header that is automatically created by the webserver for static pages. However, when you use a script to output a dynamic web page, you explicitly need to set the header yourself. I guess most blog scripts forget to do that.

    Note that in this case you are actually polling a blog that in a recent entry proposes a protocol that abolishes polling :).

      Note that in this case you are actually polling a blog that in a recent entry proposes a protocol that abolishes polling :).

      Yes, I know :)

      But I'm not religious on the topic of polling and only did it to get the work done, not to save the world and invent a new protocol. I leave that to others :)

Re: Dylan - For the sites they are a-changin'
by Aristotle (Chancellor) on Apr 25, 2003 at 13:20 UTC
    MD5 is way outdated. You should use Digest::MD5 instead. This makes it
    use Digest::MD5 qw(md5_hex); ... my $fileMd5 = md5_hex($text) . $ext;
    As a minor nit, I advise to use the three-argument form of open whenever possible (ie when you're not working with pipes; that is planned for a later version of Perl5 too though) for a variety of reasons. No, they don't really apply here, but it's good habit.
    open(my $fh, ">", $fileMd5) or warn("Could not create ($fileMd5)\n"), +next;
    And finally, for the Unixers out there using Mozilla, it can be adapted using the following code:
    if(system(mozilla => -remote => 'ping()')) { warn "mozilla not found!"; my $pid = fork; die "Couldn't fork: $!" if not defined $pid; exec { 'mozilla' } 'mozilla' if $pid == 0; } # and changing the command like so: # (obviously, use "new-window" if you prefer that) my $commandShow = q{mozilla -remote "openurl(%s, new-tab)"};
    For the Windows Mozillaites, that will be something like this (but don't quote me on it, since I have neither used Win32::Process nor any desire to expand my experience with that OS):
    Win32::Process::Create( my $process, "mozilla.exe", # maybe needs absolute path, dunno "", 0, NORMAL_PRIORITY_CLASS, "." ) || die ErrorReport() if system(mozilla => -remote => 'ping()');
    I also attached a patch for your downloadable script to make these changes (posted in a separate node to avoid confusion with the code blocks in this one).

    Makeshifts last the longest.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: CUFP [id://252709]
Approved by particle
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others romping around the Monastery: (6)
As of 2015-07-29 06:20 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    The top three priorities of my open tasks are (in descending order of likelihood to be worked on) ...









    Results (260 votes), past polls