http://www.perlmonks.org?node_id=1874

vroom has asked for the wisdom of the Perl Monks concerning the following question: (http and ftp clients)

How can my script retrieve the contents of an existing webpage?

Originally posted as a Categorized Question.

  • Comment on How can my script retrieve the contents of an existing webpage?

Replies are listed 'Best First'.
Re: How can my script retrieve the contents of an existing webpage?
by vroom (His Eminence) on Jan 11, 2000 at 02:07 UTC
    Use LWP::Simple which you may have to get off of CPAN.

    Then all you have to do is something like:

    use LWP::Simple; $webpage=get "http://www.perlmonks.org";
Re: How can my script retrieve the contents of an existing webpage?
by vroom (His Eminence) on Mar 27, 2000 at 06:16 UTC
    Another option if you have lynx on your system would be
    $webpage=`lynx -source http://blah.com`; #gets html source of documen +t $webpage=`lynx -dump http://blah.com`; #returns output as formatted + text
    A reply falls below the community's threshold of quality. You may see it by logging in.
Re: How can my script retrieve the contents of an existing webpage?
by snapdragon (Monk) on Apr 03, 2001 at 18:51 UTC
    I would use the LWP:UserAgent for something like this. I've used this qute a few times to cache content from parts of a website - the syntax to get the slashdot page (for example) would be something like:

    # create a user agent object use LWP::UserAgent; my $ua = new LWP::UserAgent; $ua->agent("AgentName/0.1 " . $ua->agent); my $url = "http://slashdot.org"; # Create a request my $req = new HTTP::Request GET => $url; $req->content_type('application/x-www-form-urlencoded'); $req->content('match=www&errors=0'); # Pass request to the user agent and get a response back my $res = $ua->request($req); # Check the outcome of the response if ($res->is_success) { print $res->content; } else { print "Was the URL correct?"; }

    That's my two cents anyway.

Re: How can my script retrieve the contents of an existing webpage?
by extremely (Priest) on Sep 02, 2001 at 21:37 UTC
    furtive is wrong there with the "system" answer. System doesn't return data. vroom was advocating the use of backticks with lynx. Placing a shell command in them nets you the standard out of the shell.
    #try these and see under un*x my $s='%Y'; print "date $s", $/; print 'date $s', $/; print `date $s`, $/; print system("date $s"), $/; ## ## Thus [vroom]'s code should be ## $webpage=`lynx -source http://blah.com`; #gets html source of documen +t $webpage=`lynx -dump http://blah.com`; #returns output as formatted + text
Re: How can my script retrieve the contents of an existing webpage?
by Anonymous Monk on Jul 18, 2001 at 18:16 UTC
    I have tested your last comment (with the user agent)and all I get is "Was the URL correct?" i have tried with several url's. does anyone know a solution? thank you

    Originally posted as a Categorized Answer.

Re: How can my script retrieve the contents of an existing webpage?
by furtive (Initiate) on Sep 02, 2001 at 20:42 UTC

    In reference to vroom's code above, the correct syntax should be:

    $webpage=system("lynx -source http://blah.com");

    This is necessary since it is the shell that runs Lynx. Otherwise, $webpage would literaly be lynx -source http://blah.com"

Re: How can my script retrieve the contents of an existing webpage?
by Anonymous Monk on Jan 02, 2002 at 18:41 UTC
    Anonymous Monk
    try to use a `correct' URL. Instead of
        www.perlmonks.org

    use
        http://www.perlmonks.org

    Originally posted as a Categorized Answer.