Beefy Boxes and Bandwidth Generously Provided by pair Networks
more useful options

Extract Web Page

by Anonymous Monk
on May 09, 2005 at 04:51 UTC ( #455094=perlquestion: print w/replies, xml ) Need Help??
Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:


can we able to extract the webpage. please guide me, is there any module or any script.

Replies are listed 'Best First'.
Re: Extract Web Page
by gopalr (Priest) on May 09, 2005 at 04:53 UTC


    Yes, we can able to extract web page by using the below:

    1st Step:

    use strict; use warnings; use LWP::UserAgent; use HTTP::Request; my $url = ''; my $ua = LWP::UserAgent->new; my $request = HTTP::Request->new(GET => $url); my $response = $ua->request($request); if ($response->is_success) { print $response->content; } else { print $response->status_line, " <URL:$url>\n"; }

    2nd Step:: Its very Simple:

    use strict; use warnings; use LWP::Simple; getprint ('');

    3rd Step:: we can get the web page from command promp..

    C:\>lwp-download ""

    4th Step:

    open MYHANDLE, "GET|"; while(<MYHANDLE>) { print $_; }



    Update Added the 4th Step.

      open MYHANDLE, "GET|";
      Incidentally, even when posting minimal code it is sensible to use the open() or die semantic, especially when showing stuff like this to a newbie, who is probably already exposed to (bad) examples of unchecked open()s.

      Whatever... is this really supposed to work? AFAICT this is just a piped open(), so it depends on the availability of the "GET" command (which BTW I've never heard) and thus is at best system-dependent. Isn't it that you meant "wget", maybe?

      Speaking of which, ideally it would be nice to have an open mode (for the three args form of open(), for reasons of backward compatibility) doing exactly this by calling the appropriate modules behind the scenes, a la

      open my $fh, 'web', '' or die $!;
      or a {layer,discipline}, maybe:
      open my $fh, '<:web', '' or die $!;
      (Something vaguely along these lines has been discussed in p6l, but on a different level - of course!)

      Just a few random thoughts...

      The fourth option is equivalent to the third, GET usually is a symlink to lwp-download. BUT note that during installation of LWP you're asked if you want the GET/HEAD/POST symlinks, and you (or the OP) could choose to answer no to avoid program name clashing, expecially for the HEAD command in filesystems that do not support case distinction.

      A typical case is in the cygwin environment when it lives in a FAT32 filesystem: in this case, HEAD and head (which gives you the first lines in a file) would clash and you wouldn't get what you want. To be "portable" in the examples, I'd always use the lwp- beginning commands.

      Flavio (perl -e 'print(scalar(reverse("\nti.xittelop\@oivalf")))')

      Don't fool yourself.
Re: Extract Web Page
by davido (Archbishop) on May 09, 2005 at 06:10 UTC

    There is a module: LWP::Simple. Guidance is found in its POD. If you get snagged on something specific, let us know.


Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://455094]
Approved by gopalr
and all is quiet...

How do I use this? | Other CB clients
Other Users?
Others pondering the Monastery: (5)
As of 2018-04-21 23:48 GMT
Find Nodes?
    Voting Booth?