Beefy Boxes and Bandwidth Generously Provided by pair Networks
go ahead... be a heretic
 
PerlMonks  

Re^2: Check website has update file using www::mechanize

by perlmad (Sexton)
on May 25, 2016 at 06:53 UTC ( [id://1164040]=note: print w/replies, xml ) Need Help??


in reply to Re: Check website has update file using www::mechanize
in thread Check website has update file using www::mechanize

Is this Possible with Http::Response ???

If yes, then how can get the last modified information from http response ???

Replies are listed 'Best First'.
Re^3: Check website has update file using www::mechanize
by Corion (Patriarch) on May 25, 2016 at 06:57 UTC

    HTTP has provisions for not sending data if it is younger than a given timestamp. See the ->mirror method of LWP::UserAgent and/or the If-Modified-Since header of HTTP.

      Which is by no means a guarantee that the data did or did not change. I deal with government data all the time, and their sites just list the ZIP/Excel/CSV/PDF files. You actually have to fetch the files in order to check if they changed (or their content changed).

      My approach is

      • Read persistent file with ZIP/CSV file checksums
      • Read site and parse links
      • For each link with a file I want/know
        • Fetch file into memory
        • Calculate SHA256
        • Compare to previous SHA256
        • same and next
        • save file
        • store SHA256
        • log/mail/other action(s)

      Enjoy, Have FUN! H.Merijn

      Yeah it's working but I still have a same problem

      my $res=$mech>mirror('download_link'); print " response is :",$res,"\n\n"; # no content

      I got download file when i ran but I need whether the file is updated or not, if updated then download otherwise just drop a message

        I hope that this is just a cut-n-paste error, as that is now what you mean. Missing a dash:

        my $res = $mech->mirror ("download_link"); # ^ there print " response is :", $res, "\n\n"; # no content

        If you are using use strict; and use warnings; running the code would show you.

        As I showed in my action list in this thread, the fact the a page that contains the links is not updated does not mean that the files it links to are not updated.

        You should post more of the real code for us to check if you are checking the right headers.


        Enjoy, Have FUN! H.Merijn

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://1164040]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others examining the Monastery: (7)
As of 2024-03-28 11:14 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found