Beefy Boxes and Bandwidth Generously Provided by pair Networks
Welcome to the Monastery
 
PerlMonks  

Re^2: Download web page including css files, images, etc.

by skx (Parson)
on Jan 25, 2007 at 14:23 UTC ( #596489=note: print w/replies, xml ) Need Help??


in reply to Re: Download web page including css files, images, etc.
in thread Download web page including css files, images, etc.

I thought about this, but couldn't see the obvious way of determining the "main" file.

For example if you run:

wget --page-requisites http://en.wikipedia.org/

The output produced is:

en.wikipedia.org/
|-- robots.txt
`-- wiki
    `-- Main_Page

Determining that wiki/Main_Page should be transformed to index.html is hard..

Steve
--
  • Comment on Re^2: Download web page including css files, images, etc.

Replies are listed 'Best First'.
Re^3: Download web page including css files, images, etc.
by PreferredUserName (Pilgrim) on Jan 25, 2007 at 18:21 UTC
    Just do:
    wget --server-response http://en.wikipedia.org/
    and you can parse out the redirects:
    --13:13:55-- http://en.wikipedia.org/ => `index.html' Resolving en.wikipedia.org... 66.230.200.100 Connecting to en.wikipedia.org|66.230.200.100|:80... connected. HTTP request sent, awaiting response... HTTP/1.0 301 Moved Permanently Date: Thu, 25 Jan 2007 18:13:41 GMT Server: Apache X-Powered-By: PHP/5.1.4 Vary: Accept-Encoding,Cookie Cache-Control: s-maxage=1200, must-revalidate, max-age=0 Last-Modified: Thu, 25 Jan 2007 18:13:41 GMT Location: http://en.wikipedia.org/wiki/Main_Page Content-Type: text/html X-Cache: HIT from sq28.wikimedia.org X-Cache-Lookup: HIT from sq28.wikimedia.org:80 Age: 14 X-Cache: HIT from sq26.wikimedia.org X-Cache-Lookup: HIT from sq26.wikimedia.org:80 Via: 1.0 sq28.wikimedia.org:80 (squid/2.6.STABLE9), 1.0 sq26.wikimed +ia.org:80 (squid/2.6.STABLE9) Connection: close ---> Location: http://en.wikipedia.org/wiki/Main_Page [following] --13:13:55-- http://en.wikipedia.org/wiki/Main_Page => `Main_Page' Connecting to en.wikipedia.org|66.230.200.100|:80... connected. HTTP request sent, awaiting response... HTTP/1.0 200 OK Date: Thu, 25 Jan 2007 18:13:44 GMT Server: Apache X-Powered-By: PHP/5.1.4 Content-Language: en Vary: Accept-Encoding,Cookie Cache-Control: private, s-maxage=0, max-age=0, must-revalidate Last-Modified: Thu, 25 Jan 2007 17:28:15 GMT Content-Type: text/html; charset=utf-8 Age: 11 X-Cache: HIT from sq30.wikimedia.org X-Cache-Lookup: HIT from sq30.wikimedia.org:80 Via: 1.0 sq30.wikimedia.org:80 (squid/2.6.STABLE9) Connection: close Length: unspecified [text/html]

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://596489]
help
Chatterbox?
and all is quiet...

How do I use this? | Other CB clients
Other Users?
Others taking refuge in the Monastery: (6)
As of 2018-07-20 05:33 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?
    It has been suggested to rename Perl 6 in order to boost its marketing potential. Which name would you prefer?















    Results (424 votes). Check out past polls.

    Notices?