Beefy Boxes and Bandwidth Generously Provided by pair Networks
XP is just a number
 
PerlMonks  

Re: Re: Re: Dealing with Word Compact HTML

by format_c (Initiate)
on Apr 14, 2004 at 22:45 UTC ( #345236=note: print w/ replies, xml ) Need Help??


in reply to Re: Re: Dealing with Word Compact HTML
in thread Dealing with Word Compact HTML

I tried a bit with HTML::Parser an I hate it because I think it's complicated to use. But parsing HTML with RegEx quickly become more complicated than parsing with HTML::Parser. So here's my snippet and I hope it'll help you:

# This script will extract text which is incuded in <b> use strict; use HTML::Parser; local $/; my $html = <DATA>; my $p = HTML::Parser->new(api_version => 3, start_h => [\&b_start_handler,"tagname,self"] ); $p->parse($html); sub b_start_handler { my ($tagname,$self) = @_; return unless $tagname eq 'b'; $self->handler(text => [], '@{dtext}' ); $self->handler(end => \&b_end_handler,"tagname,self"); } sub b_end_handler { my($tag,$self) = @_; my $text = join("", @{$self->handler("text")}); print "$text\n---\n"; $self->handler("text", undef); $self->handler("start", \&b_start_handler); $self->handler("end", undef); } __DATA__ <P class=para><a name="watch dog"></a><b>watch dog -</b> A big dog that makes sure that you don't do anything that you're not supposed to).</p> <p class=para><a name="WR"></a><b>wooden round </b> A big piece of ro und wood.</p>
Greets Alex


Comment on Re: Re: Re: Dealing with Word Compact HTML
Download Code

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://345236]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others meditating upon the Monastery: (4)
As of 2015-07-04 23:33 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    The top three priorities of my open tasks are (in descending order of likelihood to be worked on) ...









    Results (60 votes), past polls