Beefy Boxes and Bandwidth Generously Provided by pair Networks
No such thing as a small change

Re: Help Fetch HTML

by talexb (Canon)
on May 22, 2012 at 15:06 UTC ( #971810=note: print w/replies, xml ) Need Help??

in reply to Help Fetch HTML

My suggestion is to skip fetching the page using the excellent Mech module, and just do a HEAD on the URL, using some combination of the -i (If-Modified-Since) and the -o text (text output) options.

Alex / talexb / Toronto

"Groklaw is the open-source mentality applied to legal research" ~ Linus Torvalds

Replies are listed 'Best First'.
Re^2: Help Fetch HTML
by Anonymous Monk on May 22, 2012 at 16:43 UTC

    This approach might not work because the vast majority of dynamically generated web pages (if the page in question is one) don't bother with the If-Modified-Since header and just return the whole page for a GET request.

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://971810]
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others drinking their drinks and smoking their pipes about the Monastery: (6)
As of 2016-10-22 06:44 GMT
Find Nodes?
    Voting Booth?
    How many different varieties (color, size, etc) of socks do you have in your sock drawer?

    Results (292 votes). Check out past polls.