comment on

I tried a bit with HTML::Parser an I hate it because I think it's complicated to use. But parsing HTML with RegEx quickly become more complicated than parsing with HTML::Parser. So here's my snippet and I hope it'll help you:

# This script will extract text which is incuded in <b>
use strict;
use HTML::Parser;

local $/;
my $html = <DATA>;

my $p = HTML::Parser->new(api_version => 3,
  start_h     => [\&b_start_handler,"tagname,self"]
  );
  
$p->parse($html);

sub b_start_handler {
  my ($tagname,$self) = @_;
  return unless $tagname eq 'b';
  $self->handler(text  => [], '@{dtext}' );
  $self->handler(end  => \&b_end_handler,"tagname,self");
}

sub b_end_handler {
  my($tag,$self) = @_;
    my $text = join("", @{$self->handler("text")});
    print "$text\n---\n";

    $self->handler("text", undef);
    $self->handler("start", \&b_start_handler);
    $self->handler("end", undef);
}
__DATA__
<P class=para><a name="watch dog"></a><b>watch dog
-</b> A big dog that makes sure that you don't do anything that you're
not supposed to).</p>

<p class=para><a name="WR"></a><b>wooden round –</b> A big piece of ro
und wood.</p>
[download]

Greets Alex

In reply to Re: Re: Re: Dealing with Word Compact HTML by format_c
in thread Dealing with Word Compact HTML by apessos

Are you posting in the right place? Check out Where do I post X? to know for sure.
Posts may use any of the Perl Monks Approved HTML tags. Currently these include the following:
<code> <a> <b> <big> <blockquote> <br /> <dd> <dl> <dt> <em> <font> <h1> <h2> <h3> <h4> <h5> <h6> <hr /> <i> <li> <nbsp> <ol> <p> <small> <strike> <strong> <sub> <sup> <table> <td> <th> <tr> <tt> <u> <ul>
Snippets of code should be wrapped in <code> tags not <pre> tags. In fact, <pre> tags should generally be avoided. If they must be used, extreme care should be taken to ensure that their contents do not have long lines (<70 chars), in order to prevent horizontal scrolling (and possible janitor intervention).
Want more info? How to link or How to display code and escape characters are good places to start.


No such thing as a small change
	PerlMonks