Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl-Sensitive Sunglasses
 
PerlMonks  

Answer: How do I parse links out of a web page

by Anonymous Monk
on Sep 25, 2004 at 17:48 UTC ( #393817=categorized answer: print w/replies, xml ) Need Help??

Q&A > HTTP and FTP clients > How do I parse links out of a web page - Answer contributed by Anonymous Monk

You could try this as well

#!/usr/bin/perl -w use LWP::UserAgent; use HTML::LinkExtor; use URI::URL; $url = "http://www.google.ca/"; # for instance $ua = LWP::UserAgent->new; # Set up a callback that collect image links my @imgs = (); sub callback { my($tag, %attr) = @_; return if $tag ne 'a'; # we only look closer at <img ...> push(@imgs, values %attr); } # Make the parser. Unfortunately, we don't know the base yet # (it might be diffent from $url) $p = HTML::LinkExtor->new(\&callback); # Request document and parse it as it arrives $res = $ua->request(HTTP::Request->new(GET => $url), sub {$p->parse($_[0])}); # Expand all image URLs to absolute ones my $base = $res->base; @imgs = map { $_ = url($_, $base)->abs; } @imgs; # Print them out print join("\n", @imgs), "\n";

Log In?
Username:
Password:

What's my password?
Create A New User
Chatterbox?
[Tanktalus]: great, now I'm going to have to convert html codes to text for display... :)
[choroba]: Can you steal the code from pm-cb-g?
[Tanktalus]: maybe next week at the earliest. :P

How do I use this? | Other CB clients
Other Users?
Others imbibing at the Monastery: (7)
As of 2017-09-25 21:20 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?
    During the recent solar eclipse, I:









    Results (291 votes). Check out past polls.

    Notices?