Beefy Boxes and Bandwidth Generously Provided by pair Networks
Keep It Simple, Stupid

Re: Getting data out of a remote web page

by nindza (Novice)
on Feb 10, 2002 at 01:09 UTC ( #144422=note: print w/ replies, xml ) Need Help??

in reply to Getting data out of a remote web page


You can use this script as example... I used to download pics from that site...

Cheers, nindza.


#!/usr/bin/perl use LWP::UserAgent; use HTTP::Request; use HTTP::Response; use HTML::SimpleLinkExtor; $ua = new LWP::UserAgent; while(1) { $request = new HTTP::Request('GET', ''); $response = $ua->request($request); if ($response->is_success) { print "succ\n"; last; } else { print "fail\n"; ;; } } $e = HTML::SimpleLinkExtor->new(); $e->parse($response->content); @links = $e->href; chdir("/mnt/depot/babes"); foreach $link (@links) { if($link =~ /[0-9][0-9][0-9][0-9]-[0-9][0-9]-[0-9][0-9]/) { system("wget -c \"$link\""); } }

Comment on Re: Getting data out of a remote web page
Download Code
Replies are listed 'Best First'.
Re: Answer: Getting data out of a remote web page
by Juerd (Abbot) on Feb 10, 2002 at 12:11 UTC

    foreach $link (@links) { if($link =~ /[0-9][0-9][0-9][0-9]-[0-9][0-9]-[0-9][0-9]/) { system("wget -c \"$link\""); } }

    Ouch. If $link is '$(rm -rf /)1111-11-11', you're not going to like it. You should anchor the regex (/^[0-9] ... [0-9]$/) so nothing can be in front of or after your pattern. Another great thing is not having a shell parse the command line: system('wget', '-c', "$link");.

    2;0 juerd@ouranos:~$ perl -e'undef christmas' Segmentation fault 2;139 juerd@ouranos:~$

Re: Answer: Getting data out of a remote web page
by Amoe (Friar) on Feb 10, 2002 at 14:16 UTC

    Putting that code in an infinite loop is a terrible strategy. It could be a horrendous strain on the server if it's got high load currently (okay, so this probably isn't one of your concerns), it doesn't make it easy to add new code, and it reads horribly.

    Using system to get the fiels you want is also pointless. You've already shown that you can use LWP::UserAgent to get pages; why not just get the files with that?

    $ua->request(HTTP::Request->new(GET => "$link", $link)

    That'll store the file in a local file of the same name. (Might have some security implications, I don't know the internals of simple_request). Anyway, you don't really need to do this, when you have <plug>pronbot</plug> to do it for you, and having a look at the site it will work with it (Note: all disclaimers apply, pronbot's still a work in progress.)

    my one true love

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://144422]
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others romping around the Monastery: (7)
As of 2015-11-25 14:33 GMT
Find Nodes?
    Voting Booth?

    What would be the most significant thing to happen if a rope (or wire) tied the Earth and the Moon together?

    Results (677 votes), past polls