Beefy Boxes and Bandwidth Generously Provided by pair Networks
Just another Perl shrine
 
PerlMonks  

Removing nested div Tag from HTML

by mr_p (Scribe)
on Aug 18, 2011 at 18:27 UTC ( #921044=perlquestion: print w/ replies, xml ) Need Help??
mr_p has asked for the wisdom of the Perl Monks concerning the following question:

Hello Monks,

I am in need of more help from everyone here.

I am trying to parse a nested tag from HTML using HTML::Parser and I am having problems. Below is my code. Please let me know what I am doing wrong.

#!/usr/bin/perl use HTML::Parser; my $content=<<EOF; <html xmlns="http://www.w3.org/1999/xhtml"> <head> <title>Some title goes here</title> </head> <body bgcolor="#FFFFFF"> <div id="leftcol"> menu column </div> <div id="body"> <div class="content"> <li>This is Line 1 </li> </div> <p>This is Line 2</p> </div> <div id="rightcol"> news column </div> </body> </html> EOF my $p = HTML::Parser->new( api_version => 3 ); $p->handler( start => \&start_handler, "self,tagname,attr" ); $p->parse($content); sub start_handler { my $self = shift; my $tagname = shift; my $attr = shift; my $text = shift; return unless ( $tagname eq 'div' and $attr->{id} eq 'body' ); $self->handler( start => sub { print shift }, "text" ); $self->handler( text => sub { print shift }, "text" ); $self->handler(end => sub { my ($endtagname, $self, $text) = @_; if($endtagname eq $tagname) { $self->eof; } else { print $text; } }, "tagname,self,text"); }

The output should be

<div id="body"> <div class="content"> <li>This is Line 1 </li> </div> <p>This is Line 2</p> </div> <div id="rightcol"> news column </div>

But the line it cuts off before 'This is Line 2'

Comment on Removing nested div Tag from HTML
Select or Download Code
Re: Removing nested div Tag from HTML
by metaperl (Curate) on Aug 18, 2011 at 20:06 UTC
      Wouldn't HTML::Tree take too much time loading? The material I am working with is time critical.

        Wouldn't HTML::Tree take too much time loading?

        T.I.T.S.

        The material I am working with is time critical.

        Can't be that critical, if you're using perl

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://921044]
Approved by blue_cowdawg
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others making s'mores by the fire in the courtyard of the Monastery: (11)
As of 2014-09-22 15:04 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    How do you remember the number of days in each month?











    Results (198 votes), past polls