Beefy Boxes and Bandwidth Generously Provided by pair Networks
We don't bite newbies here... much
 
PerlMonks  

Re: Getting data out of a remote web page

by nindza (Novice)
on Feb 10, 2002 at 01:09 UTC ( #144422=note: print w/ replies, xml ) Need Help??


in reply to Getting data out of a remote web page

Hi!

You can use this script as example... I used to download pics from that site...

Cheers, nindza.

---

#!/usr/bin/perl use LWP::UserAgent; use HTTP::Request; use HTTP::Response; use HTML::SimpleLinkExtor; $ua = new LWP::UserAgent; while(1) { $request = new HTTP::Request('GET', 'http://www.celebdaily.com/'); $response = $ua->request($request); if ($response->is_success) { print "succ\n"; last; } else { print "fail\n"; ;; } } $e = HTML::SimpleLinkExtor->new(); $e->parse($response->content); @links = $e->href; chdir("/mnt/depot/babes"); foreach $link (@links) { if($link =~ /[0-9][0-9][0-9][0-9]-[0-9][0-9]-[0-9][0-9]/) { system("wget -c \"http://www.celebdaily.com/$link\""); } }


Comment on Re: Getting data out of a remote web page
Download Code
Re: Answer: Getting data out of a remote web page
by Juerd (Abbot) on Feb 10, 2002 at 12:11 UTC

    foreach $link (@links) { if($link =~ /[0-9][0-9][0-9][0-9]-[0-9][0-9]-[0-9][0-9]/) { system("wget -c \"http://www.celebdaily.com/$link\""); } }

    Ouch. If $link is '$(rm -rf /)1111-11-11', you're not going to like it. You should anchor the regex (/^[0-9] ... [0-9]$/) so nothing can be in front of or after your pattern. Another great thing is not having a shell parse the command line: system('wget', '-c', "http://www.celebdaily.com/$link");.

    2;0 juerd@ouranos:~$ perl -e'undef christmas' Segmentation fault 2;139 juerd@ouranos:~$

Re: Answer: Getting data out of a remote web page
by Amoe (Friar) on Feb 10, 2002 at 14:16 UTC

    Putting that code in an infinite loop is a terrible strategy. It could be a horrendous strain on the server if it's got high load currently (okay, so this probably isn't one of your concerns), it doesn't make it easy to add new code, and it reads horribly.

    Using system to get the fiels you want is also pointless. You've already shown that you can use LWP::UserAgent to get pages; why not just get the files with that?

    $ua->request(HTTP::Request->new(GET => "http://www.celebdaily.com/$link", $link)

    That'll store the file in a local file of the same name. (Might have some security implications, I don't know the internals of simple_request). Anyway, you don't really need to do this, when you have <plug>pronbot</plug> to do it for you, and having a look at the site it will work with it (Note: all disclaimers apply, pronbot's still a work in progress.)



    --
    my one true love

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://144422]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others browsing the Monastery: (7)
As of 2014-12-21 06:20 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    Is guessing a good strategy for surviving in the IT business?





    Results (104 votes), past polls