Beefy Boxes and Bandwidth Generously Provided by pair Networks
good chemistry is complicated,
and a little bit messy -LW
 
PerlMonks  

Re: Help Fetch HTML

by talexb (Canon)
on May 22, 2012 at 15:06 UTC ( #971810=note: print w/ replies, xml ) Need Help??


in reply to Help Fetch HTML

My suggestion is to skip fetching the page using the excellent Mech module, and just do a HEAD on the URL, using some combination of the -i (If-Modified-Since) and the -o text (text output) options.

Alex / talexb / Toronto

"Groklaw is the open-source mentality applied to legal research" ~ Linus Torvalds


Comment on Re: Help Fetch HTML
Select or Download Code
Re^2: Help Fetch HTML
by Anonymous Monk on May 22, 2012 at 16:43 UTC

    This approach might not work because the vast majority of dynamically generated web pages (if the page in question is one) don't bother with the If-Modified-Since header and just return the whole page for a GET request.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://971810]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others about the Monastery: (9)
As of 2014-09-01 19:14 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    My favorite cookbook is:










    Results (16 votes), past polls