Beefy Boxes and Bandwidth Generously Provided by pair Networks
go ahead... be a heretic
 
PerlMonks  

Re: LWP capabilities

by jorg (Friar)
on May 27, 2001 at 17:54 UTC ( #83600=note: print w/replies, xml ) Need Help??


in reply to LWP capabilities

Most linux distributions come with a tool called Wget. This allows you to download an entire site from a given URL. The restrictions that beatnik mentioned apply here as well though.

Jorg

"Do or do not, there is no try" -- Yoda

Replies are listed 'Best First'.
Re: Re: LWP capabilities
by sierrathedog04 (Hermit) on May 28, 2001 at 03:58 UTC
Re: Re: LWP capabilities
by rmckillen (Novice) on May 28, 2001 at 01:08 UTC
    PLEASE IGNORE MY ABOVE POST! I DID NOT FORMAT PROPERLY!

    I'd never heard of Wget before, it's a neat little tool. I couldn't get it to do exactly what I wanted. I'm hoping this is due to me passing the wrong parameters, but it probably has to do with limiatations placed on Wget by the remote web server. Let me set up the scenario:

    http://www.url.com/baseball/
    The "baseball" folder contains files:
    - index.html
    - picture.gif
    - page2.html
    - (folder also contains other files)

    Contained in the index.html file are references to picture.gif and page2.html. The index.html does not reference the other files... I don't know the names of these files, but I know they are there. When I run:

    wget -r -l1 --no-parent http://www.url.com/baseball/

    It will retrieve index.html, picture.gif, and page2.html, but not the other files that I know are present in the directory.

    How do I get Wget to retrieve the other files not referenced in index.html? Is it possible?

      If there are no links to documents, there is no way of checking if they exist (besides the actual guessing, which can take forever...). On the wget note, I quote :

      Basically it comes down to: if the webserver has dirlisting enabled and no index file, you can see the files in the directory. If those are accessible depends on several factors... (me on LWP)

      and:

      The restrictions that Beatnik mentioned apply here as well though. (jorg on wget)

      Greetz
      Beatnik
      ... Quidquid perl dictum sit, altum viditur.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://83600]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others about the Monastery: (3)
As of 2021-09-19 00:15 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found

    Notices?