The "main parser function" is either parse() or parse_file() depending on whether you have the html in memory or on disk. The parser has three functions: start, end, and text, which will be called when a new tag is encountered, ended, and text is found. You need to supply these functions yourself. Version 2 of HTML::Parser requires you to subclass HTML::Parser:
#!/usr/bin/perl
use strict;
{
package SampleParser;
use base qw(HTML::Parser);
sub start
{
my ($self, $tagname, $attr, $attrseq, $origtext) = @_;
my $at;
print "Tag: $tagname\n";
foreach $at (@{$attrseq})
{
print "Attribute: $at = $attr->{$at}\n";
}
}
sub text
{
my ($self, $origtext) = @_;
print "Text: $origtext\n";
}
}
my $html = '<html><head><title>this is the title</title><body bgcolor=
+"white">Hello</body></html>';
my $sp = new SampleParser;
$sp->parse($html)
But version 3 looks like it allows you to specify which functions to use for start end and text in the constructor (see the documentation for an example of this). | [reply] [d/l] [select] |
| [reply] |
I've used HTML::Parser twice so far, and
it's not hard to use at all. On
perlmonks.org, I've put up a script
called delirium which uses HTML::Parser
in a simplistic way. I've also written
another script called lchtml which is a
text filter that turns HTML tags and
attributes to lower case. lchtml gives
HTML::Parser a little bit more of a work
out, and you'll probably find it more
informative.
coder equ "beppu" ; asm and perl 4ever
| [reply] |