|We don't bite newbies here... much|
The first is assuming the new page you fetch will not be served up from cache some place.
That's not the problem of my solution. :) It should have access to two copies of the site from different times to compare them. :) Validating that content was not supplied from the cache or something, is either user's headache or yet another addon to the script. :)
It would also be a problem if the overall content of the page was the same, but something like the <date> was different every day. Of course, this can be argued both ways, but one must assume that changed is subjective and not objective.
Well, that was one of the reasons I suggested the use of Text::Diff from the very beginning, since it will minimize the headache. You'll be able to quickly grep away things like dates. :)
I would probably roll my own very much like you have suggested. Since the number of pages to track could get large, I would probably store the MD5 sum and the URL in a database and that's it.
You could always start away with the hash like:
Thanks for the feedback anyway. :)