Beefy Boxes and Bandwidth Generously Provided by pair Networks
Keep It Simple, Stupid
 
PerlMonks  

Re^3: (almost) preserving a web page

by Anonymous Monk
on Oct 14, 2011 at 08:28 UTC ( #931456=note: print w/ replies, xml ) Need Help??


in reply to Re^2: (almost) preserving a web page
in thread (almost) preserving a web page

httrack does that by mining the javscript for links, gets the more common ones, but doesn't get them all, and some javascript will redirect you from your local copy back to the internet

http://crawler.archive.org/ does that by inserting its own javascript which does url rewriting so the images show up (even the dynamic ones), but like httrack, actual links are rewritten ...

Then there is Mozilla Archive Format (with Faithful Save), which does a much better version of save-as, its close to perfect :)

Another common tactic is to print-to-pdf from a browser like firefox via automation


Comment on Re^3: (almost) preserving a web page

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://931456]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others meditating upon the Monastery: (4)
As of 2014-10-25 22:12 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    For retirement, I am banking on:










    Results (149 votes), past polls