Beefy Boxes and Bandwidth Generously Provided by pair Networks
We don't bite newbies here... much
 
PerlMonks  

Speed Up HTML::TableExtract

by Anonymous Monk
on Jan 24, 2006 at 22:27 UTC ( #525332=perlquestion: print w/replies, xml ) Need Help??

Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Any ideas on speeding up HTML::TableExtract? I have large single level tables that I'm extracting and this part is taking 3 minutes per table on a P4 2GH processor.
my $te = HTML::TableExtract->new; $te->parse($html);
Any ideas on how to speed it up besides getting a faster computer? The only thing I can think of is to stop using HTML::TableExtract and use regular expressions.

Replies are listed 'Best First'.
Re: Speed Up HTML::TableExtract
by GrandFather (Saint) on Jan 24, 2006 at 22:44 UTC

    If it is XHTML then you can use XML::Twig which may get you there faster. Alternatively take a look at the range of HTML parsers like HTML::TokeParser. Using regexen for parsing HTML is frought!

    Try benchmarking different approaches using a representative, but very small test table. Maybe you could post a sample (small) table here with a description of the elements you need to pull out of the table so we can provide some real sample code for you to work from? Here's a template to get you started:

    use strict; use warnings; use HTML::TableExtract; my $html = do {local $/; <DATA>}; my $te = HTML::TableExtract->new; $te->parse($html); __DATA__ <table></table>

    Sometimes, if there is a lot of work to do, you just gotta do a lot of work!


    DWIM is Perl's answer to Gödel
Re: Speed Up HTML::TableExtract
by pboin (Deacon) on Jan 24, 2006 at 22:34 UTC

    Have you benchmarked other modules?

    Just so happens, I started using HTML::TableContentParser today for example. You also might get by with an XML-like module (which I don't have much experience with. Maybe XML::Twig

    Sounds like your data's large enough that you should be trying at least 3-5 different angles of attack.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://525332]
Approved by kutsu
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others avoiding work at the Monastery: (4)
As of 2021-09-20 11:13 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found

    Notices?