Beefy Boxes and Bandwidth Generously Provided by pair Networks
Pathologically Eclectic Rubbish Lister

Re: Help Fetch HTML

by talexb (Canon)
on May 22, 2012 at 15:06 UTC ( #971810=note: print w/ replies, xml ) Need Help??

in reply to Help Fetch HTML

My suggestion is to skip fetching the page using the excellent Mech module, and just do a HEAD on the URL, using some combination of the -i (If-Modified-Since) and the -o text (text output) options.

Alex / talexb / Toronto

"Groklaw is the open-source mentality applied to legal research" ~ Linus Torvalds

Comment on Re: Help Fetch HTML
Select or Download Code
Replies are listed 'Best First'.
Re^2: Help Fetch HTML
by Anonymous Monk on May 22, 2012 at 16:43 UTC

    This approach might not work because the vast majority of dynamically generated web pages (if the page in question is one) don't bother with the If-Modified-Since header and just return the whole page for a GET request.

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://971810]
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others lurking in the Monastery: (5)
As of 2015-11-30 02:08 GMT
Find Nodes?
    Voting Booth?

    What would be the most significant thing to happen if a rope (or wire) tied the Earth and the Moon together?

    Results (756 votes), past polls