Beefy Boxes and Bandwidth Generously Provided by pair Networks
Think about Loose Coupling
 
PerlMonks  

Re: Re: Re: Dealing with Word Compact HTML

by format_c (Initiate)
on Apr 14, 2004 at 22:45 UTC ( [id://345236]=note: print w/replies, xml ) Need Help??


in reply to Re: Re: Dealing with Word Compact HTML
in thread Dealing with Word Compact HTML

I tried a bit with HTML::Parser an I hate it because I think it's complicated to use. But parsing HTML with RegEx quickly become more complicated than parsing with HTML::Parser. So here's my snippet and I hope it'll help you:
# This script will extract text which is incuded in <b> use strict; use HTML::Parser; local $/; my $html = <DATA>; my $p = HTML::Parser->new(api_version => 3, start_h => [\&b_start_handler,"tagname,self"] ); $p->parse($html); sub b_start_handler { my ($tagname,$self) = @_; return unless $tagname eq 'b'; $self->handler(text => [], '@{dtext}' ); $self->handler(end => \&b_end_handler,"tagname,self"); } sub b_end_handler { my($tag,$self) = @_; my $text = join("", @{$self->handler("text")}); print "$text\n---\n"; $self->handler("text", undef); $self->handler("start", \&b_start_handler); $self->handler("end", undef); } __DATA__ <P class=para><a name="watch dog"></a><b>watch dog -</b> A big dog that makes sure that you don't do anything that you're not supposed to).</p> <p class=para><a name="WR"></a><b>wooden round –</b> A big piece of ro und wood.</p>
Greets Alex

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://345236]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others lurking in the Monastery: (5)
As of 2024-04-19 06:48 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found