Beefy Boxes and Bandwidth Generously Provided by pair Networks
No such thing as a small change
 
PerlMonks  

Re: Re: Re: Dealing with Word Compact HTML

by format_c (Initiate)
on Apr 14, 2004 at 22:45 UTC ( #345236=note: print w/ replies, xml ) Need Help??


in reply to Re: Re: Dealing with Word Compact HTML
in thread Dealing with Word Compact HTML

I tried a bit with HTML::Parser an I hate it because I think it's complicated to use. But parsing HTML with RegEx quickly become more complicated than parsing with HTML::Parser. So here's my snippet and I hope it'll help you:

# This script will extract text which is incuded in <b> use strict; use HTML::Parser; local $/; my $html = <DATA>; my $p = HTML::Parser->new(api_version => 3, start_h => [\&b_start_handler,"tagname,self"] ); $p->parse($html); sub b_start_handler { my ($tagname,$self) = @_; return unless $tagname eq 'b'; $self->handler(text => [], '@{dtext}' ); $self->handler(end => \&b_end_handler,"tagname,self"); } sub b_end_handler { my($tag,$self) = @_; my $text = join("", @{$self->handler("text")}); print "$text\n---\n"; $self->handler("text", undef); $self->handler("start", \&b_start_handler); $self->handler("end", undef); } __DATA__ <P class=para><a name="watch dog"></a><b>watch dog -</b> A big dog that makes sure that you don't do anything that you're not supposed to).</p> <p class=para><a name="WR"></a><b>wooden round </b> A big piece of ro und wood.</p>
Greets Alex


Comment on Re: Re: Re: Dealing with Word Compact HTML
Download Code

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://345236]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others chanting in the Monastery: (4)
As of 2014-07-11 22:04 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    When choosing user names for websites, I prefer to use:








    Results (235 votes), past polls