Who has used HTML::Parser??

Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Hi, I just got the HTML::Parser 3.1 module and i am trying to figure out how to use it. I have no prior experience with modules, and the biggest problem i am having with this one is that i cant figure out what the elements of a parser object are. The documentation shows how to create a new parser data structure and perform some operations on it, but i can't find what the different parts of this structure are. More over, i can't seem to find where the main parser function is. I need to figure out what the function parser returns. if anyone has any experience with this module, please help. Thanks, Shaheeb.

Comment on Who has used HTML::Parser??

Replies are listed 'Best First'.
Re: Who has used HTML::Parser?? by nardo (Friar) on Jul 06, 2000 at 04:56 UTC
The "main parser function" is either `parse()` or `parse_file()` depending on whether you have the html in memory or on disk. The parser has three functions: start, end, and text, which will be called when a new tag is encountered, ended, and text is found. You need to supply these functions yourself. Version 2 of HTML::Parser requires you to subclass HTML::Parser: `#!/usr/bin/perl use strict; { package SampleParser; use base qw(HTML::Parser); sub start { my ($self, $tagname, $attr, $attrseq, $origtext) = @_; my $at; print "Tag: $tagname\n"; foreach $at (@{$attrseq}) { print "Attribute: $at = $attr->{$at}\n"; } } sub text { my ($self, $origtext) = @_; print "Text: $origtext\n"; } } my $html = '<html><head><title>this is the title</title><body bgcolor= +"white">Hello</body></html>'; my $sp = new SampleParser; $sp->parse($html)` [download] But version 3 looks like it allows you to specify which functions to use for start end and text in the constructor (see the documentation for an example of this).	[reply] [d/l] [select]
Re: Who has used HTML::Parser?? by ZZamboni (Curate) on Jul 06, 2000 at 04:08 UTC
I haven't used HTML::Parser, but issue #17 of The Perl Journal has a pretty good introductory article about it. You may find it useful. --ZZamboni	[reply]
Re: Who has used HTML::Parser?? by beppu (Hermit) on Jul 06, 2000 at 09:50 UTC
I've used HTML::Parser twice so far, and it's not hard to use at all. On perlmonks.org, I've put up a script called delirium which uses HTML::Parser in a simplistic way. I've also written another script called lchtml which is a text filter that turns HTML tags and attributes to lower case. lchtml gives HTML::Parser a little bit more of a work out, and you'll probably find it more informative. They're both available for browsing at http://opensource.lineo.com/cgi-bin/cvsweb/scripts/little/ delirium and lchtml coder equ "beppu" ; asm and perl 4ever	[reply]


Come for the quick hacks, stay for the epiphanies.
	PerlMonks