http://www.perlmonks.org?node_id=880958

mr_p has asked for the wisdom of the Perl Monks concerning the following question:

Hello Monks,

I am trying to parse html text with tags intact. The code I wrote using HTML::Parser strips out the tags.

Is there a way to keep tags intact?

below is my code.

#!/usr/bin/perl package MyParser; use base qw(HTML::Parser); my $main_content=""; sub start { my ($self, $tag, $attr, $attrseq, $origtext) = @_; if ($tag =~ /^span$/i && $attr->{'class'} =~ /^main-content$/i +) { # set if we find <span class="main-content" $content_flag = 1; } } sub text { my ($self, $text) = @_; # If we're in <H1>...</H1> or if ($content_flag) { $main_content .= $text; } } my $html = " <html> <head> <title>Blah</title> </head> <span class=\"main-content\"> <bold_text> Here's the body 1 </bold_text> <p> para1 </p> <p> para2 </p> </span> </html>"; my $parser = MyParser->new; $parser->parse("$html"); print "$main_content\n";

Output I get:

Here's the body 1 para1 para2

Output I need is:

<bold_text> Here's the body 1 </bold_text> <p> para1 </p> <p> para2 </p>

I would like the above output to still have tags, is it possible to do this with HTML::Parser?