Beefy Boxes and Bandwidth Generously Provided by pair Networks
Pathologically Eclectic Rubbish Lister
 
PerlMonks  

Re: web page source?

by Falkkin (Chaplain)
on Feb 22, 2001 at 06:48 UTC ( #60119=note: print w/ replies, xml ) Need Help??


in reply to web page source?

To get the source, I'd get LWP::Simple from CPAN. The code to get your source would then be a simple 2-liner:

use LWP::Simple; my $source = get("http://whatever.url.you/want/to/view.html");
You only need the "use" directive once in your program; use the get() command every time you need to get the source of a page.

Writing an HTML parser by hand is very non-trivial... I'd look at HTML::Parser (again, at CPAN) and see if that'll make your life easier. I've not really used HTML::Parser before, but, by looking at the documentation and playing around for the last 15 minutes, it appears you'd want to do something like the following:

#!/usr/bin/perl -w use strict; use LWP::Simple; use HTML::Parser; my $source = get("http://www.perlmonks.org"); my $parser = HTML::Parser->new(); $parser->handler( start => \&function, 'token0, attr'); $parser->parse($source); sub function { my ($tag_name, $attr_ref) = @_; if ($tag_name eq 'a') { my %attr = %$attr_ref; print $attr{href}, "\n"; } }


Comment on Re: web page source?
Select or Download Code

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://60119]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others browsing the Monastery: (6)
As of 2014-07-29 01:48 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    My favorite superfluous repetitious redundant duplicative phrase is:









    Results (211 votes), past polls