"be consistent"

Re: web page source?

by Falkkin (Chaplain)
on Feb 22, 2001

in reply to web page source?

To get the source, I'd get LWP::Simple from CPAN. The code to get your source would then be a simple 2-liner:
use LWP::Simple; my $source = get("");
You only need the "use" directive once in your program; use the get() command every time you need to get the source of a page.

Writing an HTML parser by hand is very non-trivial... I'd look at HTML::Parser (again, at CPAN) and see if that'll make your life easier. I've not really used HTML::Parser before, but, by looking at the documentation and playing around for the last 15 minutes, it appears you'd want to do something like the following:

#!/usr/bin/perl -w use strict; use LWP::Simple; use HTML::Parser; my $source = get(""); my $parser = HTML::Parser->new(); $parser->handler( start => \&function, 'token0, attr'); $parser->parse($source); sub function { my ($tag_name, $attr_ref) = @_; if ($tag_name eq 'a') { my %attr = %$attr_ref; print $attr{href}, "\n"; } }

