Beefy Boxes and Bandwidth Generously Provided by pair Networks
Think about Loose Coupling
 
PerlMonks  

Removing nested div Tag from HTML

by mr_p (Scribe)
on Aug 18, 2011 at 18:27 UTC ( #921044=perlquestion: print w/ replies, xml ) Need Help??
mr_p has asked for the wisdom of the Perl Monks concerning the following question:

Hello Monks,

I am in need of more help from everyone here.

I am trying to parse a nested tag from HTML using HTML::Parser and I am having problems. Below is my code. Please let me know what I am doing wrong.

#!/usr/bin/perl use HTML::Parser; my $content=<<EOF; <html xmlns="http://www.w3.org/1999/xhtml"> <head> <title>Some title goes here</title> </head> <body bgcolor="#FFFFFF"> <div id="leftcol"> menu column </div> <div id="body"> <div class="content"> <li>This is Line 1 </li> </div> <p>This is Line 2</p> </div> <div id="rightcol"> news column </div> </body> </html> EOF my $p = HTML::Parser->new( api_version => 3 ); $p->handler( start => \&start_handler, "self,tagname,attr" ); $p->parse($content); sub start_handler { my $self = shift; my $tagname = shift; my $attr = shift; my $text = shift; return unless ( $tagname eq 'div' and $attr->{id} eq 'body' ); $self->handler( start => sub { print shift }, "text" ); $self->handler( text => sub { print shift }, "text" ); $self->handler(end => sub { my ($endtagname, $self, $text) = @_; if($endtagname eq $tagname) { $self->eof; } else { print $text; } }, "tagname,self,text"); }

The output should be

<div id="body"> <div class="content"> <li>This is Line 1 </li> </div> <p>This is Line 2</p> </div> <div id="rightcol"> news column </div>

But the line it cuts off before 'This is Line 2'

Comment on Removing nested div Tag from HTML
Select or Download Code
Replies are listed 'Best First'.
Re: Removing nested div Tag from HTML
by metaperl (Curate) on Aug 18, 2011 at 20:06 UTC
      Wouldn't HTML::Tree take too much time loading? The material I am working with is time critical.

        Wouldn't HTML::Tree take too much time loading?

        T.I.T.S.

        The material I am working with is time critical.

        Can't be that critical, if you're using perl

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://921044]
Approved by blue_cowdawg
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others having an uproarious good time at the Monastery: (15)
As of 2015-07-31 13:29 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    The top three priorities of my open tasks are (in descending order of likelihood to be worked on) ...









    Results (277 votes), past polls