Beefy Boxes and Bandwidth Generously Provided by pair Networks
Pathologically Eclectic Rubbish Lister
 
PerlMonks  

HTML::TokeParser::Listerine

by Amoe (Friar)
on Apr 13, 2002 at 19:04 UTC ( #158834=sourcecode: print w/ replies, xml ) Need Help??

Category: HTML Utility
Author/Contact Info Amoe.
Description:

Makes HTML::TokeParser return a list when get_tag and get_token are called in list context. Other than that, identical to using a while to iterate over. It's to enable me to say:

my @links = map { $_->[1]{href} } $parser->get_tag('a')

And expect it to work. Sating the addiction of map-junkies. :)

This code would possibly be better applied to HTML::PullParser, but if I applied it to that I'd have to reimplement get_tag and do some other stuff which I don't want to. I think, anyway.

package HTML::TokeParser::Listerine;
use strict;
use warnings;
use base 'HTML::TokeParser';

sub get_tag {
    my $self = shift;
    if (wantarray) {

        # build and return a list

        my @tags;
        while ( my $tag = $self->SUPER::get_tag(@_) ) { # delegate to 
+superclass
            push @tags, $tag;
        }
        return @tags;
    }
    else { return $self->SUPER::get_tag(@_) }
}

sub get_token {
    my $self = shift;
    if (wantarray) {

        # build and return a list

        my @tokens;
        while ( my $token = $self->SUPER::get_token(@_) )
        {    # delegate to superclass
            push @tokens, $token;
        }
        return @tokens;
    }
    else { return $self->SUPER::get_token(@_) }
}

1;

__END__

=pod

=head1 NAME

HTML::TokeParser::Listerine - Context-sensitive HTML token parsing

=head1 SYNOPSIS

 use HTML::TokeParser::Listerine;
 my $html = q {

 <html>
  <body>
   <!-- Match my comment, and include it  -->
   <!-- in the output of get_token        -->
   <a href="http://www.foo.com">Bar</a><br />
   <a href="http://www.bar.com">Foo</a><br />
  </body>
 </html>

 };

 my $p = HTML::TokeParser::Listerine->new(\$html);

 # magically parse html with map rather than tedious while!
 # you could also use get_token to do this
 my @links    = map { $_->[1]->{href} } $p->get_tag('a');

 print "Links are: ", join("\n", @links), "\n";

=head1 DESCRIPTION

HTML::TokeParser::Listerine overrides the C<get_tag> and C<get_token> 
+methods
of HTML::TokeParser to make them DWIM in a list context, for example o
+ne
provided by the C<grep> and C<map> operators. This allows you to do te
+rse
complex filtering, rather than having to enter a big while loop everyt
+ime you
want to parse HTML, which isn't easy on the eye.

Obviously, this is a slower approach than doing it with a while loop, 
+as
internally it uses the same mechanism. It simply saves you typing, and
+ that can
be a lot more convenient than you think.

=head1 METHODS

The only difference to HTML::TokeParser is that if you use the methods
C<get_tag> and C<get_token> in list context they return a list of all 
+the tags
and tokens, respectively. Using it in scalar context should behave the
+ same as
vanilla TokeParser does.

=head1 AUTHOR

Amoe.

=head1 REQUIREMENTS

HTML::TokeParser and everything else that depends on.

=head1 SEE ALSO

HTML::TokeParser and HTML::PullParser manpages.

=cut

Comment on HTML::TokeParser::Listerine
Download Code
Re: HTML::TokeParser::Listerine
by NodeReaper (Curate) on Apr 24, 2004 at 00:12 UTC
    This node was taken out by the NodeReaper on Sat Apr 24 12:18:13 2004 (EST)
    Reason: valdez please delete

    For more information on this node visit: this

Back to Code Catacombs

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: sourcecode [id://158834]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others romping around the Monastery: (5)
As of 2015-07-05 15:23 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    The top three priorities of my open tasks are (in descending order of likelihood to be worked on) ...









    Results (67 votes), past polls