Beefy Boxes and Bandwidth Generously Provided by pair Networks
Do you know where your variables are?
 
PerlMonks  

Re:^2 (nrd) Extracting attributes from anchor tags in an HTML page

by newrisedesigns (Curate)
on Jan 10, 2003 at 20:17 UTC ( #225933=note: print w/ replies, xml ) Need Help??


in reply to Re: Extracting href's
in thread Extracting href's

No offense to andye (still good advice), but parsing HTML with a regexp is a bad idea.

Stick with a tried-and true module. It is far less likely to break on you (usually due to bad HTML, not code) and allows for further learning.

Take for example HTML::TokeParser:

my $content = get($url); my $ref = \$content; my $p = HTML::TokeParser->new($ref); my $token; while ($token = $p->get_tag("a")) { my $href = $token->[1]{href}; my $text = $p->get_trimmed_text("/a"); print "$href => $text"; } ## Should work...

This looks intimitating, and it is. :) However, by learning how to use modules like TokeParser you'll not only get a better handle on what you want to do, but you'll be learning more about Perl, as well.

Also, if you plan on doing this often, I suggest picking up a copy of Perl & LWP. It's a good resource for interacting with websites.

John J Reiser
newrisedesigns.com


Comment on Re:^2 (nrd) Extracting attributes from anchor tags in an HTML page
Download Code

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://225933]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others wandering the Monastery: (5)
As of 2015-07-30 05:05 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    The top three priorities of my open tasks are (in descending order of likelihood to be worked on) ...









    Results (270 votes), past polls