Beefy Boxes and Bandwidth Generously Provided by pair Networks
"be consistent"
 
PerlMonks  

How do I "reset" HTML::TableExtract?

by Cody Pendant (Prior)
on Sep 18, 2006 at 10:51 UTC ( [id://573522]=perlquestion: print w/replies, xml ) Need Help??

Cody Pendant has asked for the wisdom of the Perl Monks concerning the following question:

Every time I invoke HTML::TableExtract's parse() method, it doesn't re-initialise the object, it appends to the object. See example below.

I want to use it on a new table every time (iterating through a paged website with a scraper), and it doesn't make sense to keep the previous table.

My workaround is to just re-initialise the object with new(), but that feels wrong. I've read through the POD for TableExtract and and I'm baffled. There doesn't seem to be a preference for this behaviour and there doesn't seem to be a method to re-initialise the object either in TableExtract or HTML::Parser.

use strict; use warnings; use diagnostics; use HTML::TableExtract; my $table_1 = ' <table><tr><td>foo</td><td>bar</td></tr> <tr><td>baz</td><td>quux</td></tr></table>'; my $table_2 = ' <table><tr><td>bof</td><td>xyzzy</td></tr> <tr><td>bat</td><td>gazonk</td></tr></table>'; my $te = HTML::TableExtract->new(); $te->parse($table_1); foreach my $ts ($te->tables) { print "Table (", join(',', $ts->coords), "):\n"; foreach my $row ($ts->rows) { print join(',', @$row), "\n"; } } ## what goes here if I want to dump table_1 ? $te->parse($table_2); foreach my $ts ($te->tables) { print "Table (", join(',', $ts->coords), "):\n"; foreach my $row ($ts->rows) { print join(',', @$row), "\n"; } }


($_='kkvvttuu bbooppuuiiffss qqffssmm iibbddllffss')
=~y~b-v~a-z~s; print

Replies are listed 'Best First'.
Re: How do I "reset" HTML::TableExtract?
by Thelonius (Priest) on Sep 18, 2006 at 14:05 UTC
    Try $te->eof
Re: How do I "reset" HTML::TableExtract?
by greatshots (Pilgrim) on Sep 18, 2006 at 11:23 UTC
      by design
Re: How do I "reset" HTML::TableExtract?
by mojotoad (Monsignor) on Sep 19, 2006 at 19:40 UTC
    There is a private method, _reset_state(), that does what you want.

    Having said that, creating a new HTML::TableExtract object each time through is not adding any significant overhead relative to the parsing load.

    Cheers,
    Matt

Re: How do I "reset" HTML::TableExtract?
by jpeg (Chaplain) on Sep 20, 2006 at 00:34 UTC

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://573522]
Approved by Velaki
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others meditating upon the Monastery: (3)
As of 2025-07-16 07:58 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found

    Notices?
    erzuuliAnonymous Monks are no longer allowed to use Super Search, due to an excessive use of this resource by robots.