Beefy Boxes and Bandwidth Generously Provided by pair Networks
Just another Perl shrine
 
PerlMonks  

Answer: How do I parse links out of a web page

( #90761=categorized answer: print w/ replies, xml ) Need Help??

Q&A > HTTP and FTP clients > How do I parse links out of a web page contributed by agent00013

The Perl Cookbook has a good example:
#!/usr/local/bin/perl # xurl - extract unique, sorted lists of links from URL use HTML::LinkExtor; use LWP::Simple; $base_url = shift; $parser = HTML::LinkExtor->new(undef, $base_url); $parser->parse(get($base_url))->eof; @links = $parser->links; foreach $linkarray (@links) { local(@element) = @$linkarray; local($elt_type) = shift @element; while (@element) { local($attr_name, $attr_value) = splice (@element, 0, 2); $seen{$attr_value}++; } } for (sort keys %seen) { print $_, "\n"}
Hope this helps. /msg me if you need anything else.

Comment on Answer: How do I parse links out of a web page
Download Code
Log In?
Username:
Password:

What's my password?
Create A New User
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others musing on the Monastery: (16)
As of 2015-07-31 17:37 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    The top three priorities of my open tasks are (in descending order of likelihood to be worked on) ...









    Results (279 votes), past polls