Beefy Boxes and Bandwidth Generously Provided by pair Networks
laziness, impatience, and hubris
 
PerlMonks  

HTML::TokeParser::Listerine

by Amoe (Friar)
on Apr 13, 2002 at 19:04 UTC ( #158834=sourcecode: print w/replies, xml ) Need Help??
Category: HTML Utility
Author/Contact Info Amoe.
Description:

Makes HTML::TokeParser return a list when get_tag and get_token are called in list context. Other than that, identical to using a while to iterate over. It's to enable me to say:

my @links = map { $_->[1]{href} } $parser->get_tag('a')

And expect it to work. Sating the addiction of map-junkies. :)

This code would possibly be better applied to HTML::PullParser, but if I applied it to that I'd have to reimplement get_tag and do some other stuff which I don't want to. I think, anyway.

package HTML::TokeParser::Listerine;
use strict;
use warnings;
use base 'HTML::TokeParser';

sub get_tag {
    my $self = shift;
    if (wantarray) {

        # build and return a list

        my @tags;
        while ( my $tag = $self->SUPER::get_tag(@_) ) { # delegate to 
+superclass
            push @tags, $tag;
        }
        return @tags;
    }
    else { return $self->SUPER::get_tag(@_) }
}

sub get_token {
    my $self = shift;
    if (wantarray) {

        # build and return a list

        my @tokens;
        while ( my $token = $self->SUPER::get_token(@_) )
        {    # delegate to superclass
            push @tokens, $token;
        }
        return @tokens;
    }
    else { return $self->SUPER::get_token(@_) }
}

1;

__END__

=pod

=head1 NAME

HTML::TokeParser::Listerine - Context-sensitive HTML token parsing

=head1 SYNOPSIS

 use HTML::TokeParser::Listerine;
 my $html = q {

 <html>
  <body>
   <!-- Match my comment, and include it  -->
   <!-- in the output of get_token        -->
   <a href="http://www.foo.com">Bar</a><br />
   <a href="http://www.bar.com">Foo</a><br />
  </body>
 </html>

 };

 my $p = HTML::TokeParser::Listerine->new(\$html);

 # magically parse html with map rather than tedious while!
 # you could also use get_token to do this
 my @links    = map { $_->[1]->{href} } $p->get_tag('a');

 print "Links are: ", join("\n", @links), "\n";

=head1 DESCRIPTION

HTML::TokeParser::Listerine overrides the C<get_tag> and C<get_token> 
+methods
of HTML::TokeParser to make them DWIM in a list context, for example o
+ne
provided by the C<grep> and C<map> operators. This allows you to do te
+rse
complex filtering, rather than having to enter a big while loop everyt
+ime you
want to parse HTML, which isn't easy on the eye.

Obviously, this is a slower approach than doing it with a while loop, 
+as
internally it uses the same mechanism. It simply saves you typing, and
+ that can
be a lot more convenient than you think.

=head1 METHODS

The only difference to HTML::TokeParser is that if you use the methods
C<get_tag> and C<get_token> in list context they return a list of all 
+the tags
and tokens, respectively. Using it in scalar context should behave the
+ same as
vanilla TokeParser does.

=head1 AUTHOR

Amoe.

=head1 REQUIREMENTS

HTML::TokeParser and everything else that depends on.

=head1 SEE ALSO

HTML::TokeParser and HTML::PullParser manpages.

=cut
Replies are listed 'Best First'.
Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: sourcecode [id://158834]
help
Chatterbox?
[ambrus]: what push_write thing? I don't think you need that. that's implemented generically by AnyEvent::Handle
[Corion]: ambrus: Yeah, that's what I think as well. But you give me an idea, maybe I should start with implementing the timer, as that should be far simpler and with fewer edge-cases/nasty interaction than the file watcher
[ambrus]: You only provide the watcher part that tells when the handle is readable or writable, not the actual writing and reading.
[Corion]: ambrus: Hmmm. It makes sense that AnyEvent would implement the push_write itself, but I think I don't have a good idea of where the boundary between AnyEvent and the underlying event system lies... Implementing the timer should give me a better idea
[ambrus]: Corion: push_write is in the higher level abstraction of AnyEvent::Handle, not in the watcher
[Corion]: ambrus: Hmm - rereading Prima::File, that merrily coincides with what Prima does - it tells you "you can read", and you're supposed to read from the fh yourself. I thought it called you with the data already read, which would've been harder to integrate
[ambrus]: you just need an io watcher, created by &AnyEvent::Impl:: Whatever::io(...)
[Corion]: So after talking it through with you even while I'm still not entirely clear on where AE ends and my implementation begins, I think I understand that I only need to implement some smaller parts for each functionality I want to support.
[Corion]: Yeah... and you might even be able to mix and match additional functionality if you have additional async suppliers, like from a separate thread
[ambrus]: You hvae to be careful with the timer, because apparently Prima::Timer insists on being periodic, wheras AnyEvent::Impl:: Whatever::timer should give a one-shot timer watcher

How do I use this? | Other CB clients
Other Users?
Others perusing the Monastery: (6)
As of 2016-12-08 12:23 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?
    On a regular basis, I'm most likely to spy upon:













    Results (141 votes). Check out past polls.