Beefy Boxes and Bandwidth Generously Provided by pair Networks
P is for Practical
 
PerlMonks  

Re: HTML::Parser example wanted...

by andreychek (Parson)
on Jun 26, 2001 at 19:19 UTC ( #91622=note: print w/ replies, xml ) Need Help??


in reply to HTML::Parser example wanted...

Actually, there are a bunch of examples that come with the HTML::Parser module, found in the "eg" directory. Taking the code from there, here is an example of how to parse all the text from an HTML document:

#!/usr/bin/perl -w # Extract all plain text from an HTML file use strict; use HTML::Parser 3.00 (); my %inside; sub tag { my($tag, $num) = @_; $inside{$tag} += $num; print " "; # not for all tags } sub text { return if $inside{script} || $inside{style}; print $_[0]; } HTML::Parser->new(api_version => 3, handlers => [start => [\&tag, "tagname, '+1'"], end => [\&tag, "tagname, '-1'"], text => [\&text, "dtext"], ], marked_sections => 1, )->parse_file(shift) || die "Can't open file: $!\n";;
That code is located in eg/htext. After taking a look, you can see that it is event driven. The HTML::Parser->new line has an option in it called "handlers", which tells HTML::Parser which function to call upon seeing a certain tag type. In this case, every start tag calls the function "tag" with the parameters "tagname", which is the actual tagname, and +1, which identifies it as a start tag.

Personally, I have had more luck with HTML::TokeParser, but that isn't the case for everyone I'm sure. I find that HTML::TokeParser is a bit more intuitive for this sort of job, but that is perhaps just the way I think.. or maybe I just wasn't using it right ;-) In any case, good luck.
-Eric


Comment on Re: HTML::Parser example wanted...
Download Code

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://91622]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others contemplating the Monastery: (5)
As of 2015-07-05 17:58 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    The top three priorities of my open tasks are (in descending order of likelihood to be worked on) ...









    Results (67 votes), past polls