Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl: the Markov chain saw
 
PerlMonks  

Get Web Page

by kanish (Sexton)
on Oct 25, 2005 at 04:20 UTC ( #502624=perlquestion: print w/ replies, xml ) Need Help??
kanish has asked for the wisdom of the Perl Monks concerning the following question:

Hi Monks,

I am new to perl and also new to this forum

I want to extract web page. for example i need to extract perlmonks.com page.

Thanks a lot!

Kanishk

Comment on Get Web Page
Re: Get Web Page
by GrandFather (Cardinal) on Oct 25, 2005 at 04:23 UTC

    Take a look at LWP


    Perl is Huffman encoded by design.
Re: Get Web Page
by Tanktalus (Canon) on Oct 25, 2005 at 04:25 UTC

    So ... what have you tried?

    (Hint: LWP::Simple.)

    Update: Please also get the gods permission to hammer PM - if it's a few odd requests, it'll probably be ok, but don't send dozens of requests in a short timeframe without their permission. You'll be using their bandwidth and CPU time to the detriment of, well, actual users. This goes along with the reason why Google has an API to use to do searches rather than allowing people to get their web pages programmatically.

Re: Get Web Page
by monkfan (Curate) on Oct 25, 2005 at 04:25 UTC
    Can you be specific on what do you want to extract?
    Perhaps you need to have a look at these modules LWP or WWW::Mechanize.

    Regards,
    Edward
Re: Get Web Page
by pg (Canon) on Oct 25, 2005 at 04:27 UTC

    You said that you were new to Perl, but I am not sure whether you are also new to programming. If yes, one advise: always checking return code from function calls like I did below:

    use LWP::UserAgent; use warnings; use strict; my $ua = LWP::UserAgent->new(); my $res = $ua->get('http://www.perlmonks.org/'); if ($res->is_success()) { print $res->content(); } else { print "Failed, " . $res->status_line() . "\n";; }
Re: Get Web Page
by gopalr (Priest) on Oct 25, 2005 at 04:27 UTC

    Hi Kanishk

    WWW:Mechanize will do this work.

    use strict; use WWW::Mechanize; my $mech=WWW::Mechanize->new(); $mech->get('http://www.perlmonks.com'); $mech->success() || die "$mech->status()"; print $mech->content();

    And also take a look at this link Extract Web Page

    Thanks,
    Gopal.R

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://502624]
Approved by monkfan
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others having an uproarious good time at the Monastery: (5)
As of 2014-12-20 11:43 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    Is guessing a good strategy for surviving in the IT business?





    Results (95 votes), past polls