http://www.perlmonks.org?node_id=25730

jcwren has asked for the wisdom of the Perl Monks concerning the following question:

Thanks to ase (who I can't put in brackets, I get a server error), I've been playing with the HTML::TableExtract module. This is a really slick little module for extracting table data from HTML pages. However, it has a minor drawback for what I'm trying to do. If there is any HTML data between the <TD> and </TD> tags, it gets stripped. I would like it to return the HTML between the tags, and I've figured out how to do that. Unfortunately, I can't figure out how to access the data I've stored. Below is a model of what's happening:

I need to override the _add_text() method in the HTML::TableExtract::TableState class, which I can do with 'sub HTML::TableExtract::TableState::_add_text'. This is dirty, but works (with a warning). I'd rather subclass the HTML::TableExtract::TableState package, and invoke the parent _add_text() routine with a $self->SUPER::_add_text() call. However, since the HTML::TableExtract::TableState package is internal to the HTML::TableExtract module, and HTML::TableExtract explicitly does a '$ts = new HTML::TableExtract::TableState()', I don't know how to accomplish the goal.

The _add_text() that I provide needs to access the data I've stored in the jcwExtract module. If I can either figure out how to access the parent's parent data (HTML::TableExtract::TableState -> HTML::TableExtract -> ->jcwExtract), I can do this, but it feels unclean. I'd rather figure out how to subclass the HTML::TableExtract::TableState module and override the _add_text() method.

I would post the code, but it's a little lengthy, so instead, here's a link to it. It's difficult to boil down to a short test case, but I'll try to add some more to it in a bit. I'll be happy to try any suggestions anyone has as to how to pull this off...

--Chris