How can my script retrieve the contents of an existing webpage?

http://www.perlmonks.org?node_id=1874

vroom has asked for the wisdom of the Perl Monks concerning the following question: ⭐ (http and ftp clients)

How can my script retrieve the contents of an existing webpage?

Originally posted as a Categorized Question.

Comment on How can my script retrieve the contents of an existing webpage?

Replies are listed 'Best First'.
Re: How can my script retrieve the contents of an existing webpage?⭐ by vroom (His Eminence) on Jan 11, 2000 at 02:07 UTC
Use LWP::Simple which you may have to get off of CPAN. Then all you have to do is something like: `use LWP::Simple; $webpage=get "http://www.perlmonks.org";` [download]	[reply] [d/l]
Re: How can my script retrieve the contents of an existing webpage?⭐ by vroom (His Eminence) on Mar 27, 2000 at 06:16 UTC
Another option if you have lynx on your system would be $webpage=`lynx -source http://blah.com`; #gets html source of documen +t $webpage=`lynx -dump http://blah.com`; #returns output as formatted + text [download]	[reply] [d/l]
A reply falls below the community's threshold of quality. You may see it by logging in.
Re: How can my script retrieve the contents of an existing webpage?⭐ by snapdragon (Monk) on Apr 03, 2001 at 18:51 UTC
I would use the LWP:UserAgent for something like this. I've used this qute a few times to cache content from parts of a website - the syntax to get the slashdot page (for example) would be something like: # create a user agent object use LWP::UserAgent; my $ua = new LWP::UserAgent; $ua->agent("AgentName/0.1 " . $ua->agent); my $url = "http://slashdot.org"; # Create a request my $req = new HTTP::Request GET => $url; $req->content_type('application/x-www-form-urlencoded'); $req->content('match=www&errors=0'); # Pass request to the user agent and get a response back my $res = $ua->request($req); # Check the outcome of the response if ($res->is_success) { print $res->content; } else { print "Was the URL correct?"; } [download] That's my two cents anyway.	[reply] [d/l]
Re: Answer: How can my script retrieve the contents of an existing webpage? by merlyn (Sage) on Apr 03, 2001 at 18:55 UTC
You can't have "content" if it's a GET! And LWP::Simple is probably simpler here, for just fetching a URL. -- Randal L. Schwartz, Perl hacker	[reply]
Re: How can my script retrieve the contents of an existing webpage?⭐ by extremely (Priest) on Sep 02, 2001 at 21:37 UTC
furtive is wrong there with the "system" answer. System doesn't return data. vroom was advocating the use of backticks with lynx. Placing a shell command in them nets you the standard out of the shell. #try these and see under un*x my $s='%Y'; print "date $s", $/; print 'date $s', $/; print `date $s`, $/; print system("date $s"), $/; ## ## Thus [vroom]'s code should be ## $webpage=`lynx -source http://blah.com`; #gets html source of documen +t $webpage=`lynx -dump http://blah.com`; #returns output as formatted + text [download]	[reply] [d/l]
Re: How can my script retrieve the contents of an existing webpage? by Anonymous Monk on Jul 18, 2001 at 18:16 UTC
I have tested your last comment (with the user agent)and all I get is "Was the URL correct?" i have tried with several url's. does anyone know a solution? thank you Originally posted as a Categorized Answer.	[reply]
Re: How can my script retrieve the contents of an existing webpage? by furtive (Initiate) on Sep 02, 2001 at 20:42 UTC
In reference to vroom's code above, the correct syntax should be: `$webpage=system("lynx -source http://blah.com");` This is necessary since it is the shell that runs Lynx. Otherwise, `$webpage` would literaly be `lynx -source http://blah.com"`	[reply] [d/l] [select]
Re: How can my script retrieve the contents of an existing webpage? by Anonymous Monk on Jan 02, 2002 at 18:41 UTC
Anonymous Monk try to use a `correct' URL. Instead of www.perlmonks.org use http://www.perlmonks.org Originally posted as a Categorized Answer.	[reply]

Back to Seekers of Perl Wisdom