Beefy Boxes and Bandwidth Generously Provided by pair Networks
P is for Practical
 
PerlMonks  

Re^4: HTML Table to MYSQL DB

by Fiddler (Initiate)
on Nov 03, 2011 at 18:26 UTC ( #935738=note: print w/ replies, xml ) Need Help??


in reply to Re^3: HTML Table to MYSQL DB
in thread HTML Table to MYSQL DB

Below is where i am with the code. The problem is that it stops at the first table.... If i get it to loop through the tables in the htm doc, then writing the rows to the DB should be fairly simple.

use warnings; use strict; use HTML::TableExtract; use LWP::Simple; my $file ="/path/to/file/file.htm"; my $T = HTML::TableExtract->new(); my $table = $T->parse_file($file) ->first_table_found; my @rows = $table->rows; foreach my $row ($T->rows) { print join(',', @$row), "\n"; }


Comment on Re^4: HTML Table to MYSQL DB
Download Code
Re^5: HTML Table to MYSQL DB
by choroba (Abbot) on Nov 03, 2011 at 19:03 UTC
    Instead of first_table_found, use a loop over $T->tables as shown in the documentation of HTML::TableExtract. To get the timestamp, you will probably need something more powerful, as HTML::Parser.
      Hey Monks, i'm so close i can taste it... I have decided to cut the HTML tables up into chunks so that i can work with them easier on each loop. The problem now is that i'm getting the same variables every time the program goes to the next chunk of data... See my code below.. i'll appreciate it if you can tell me where exactly i went wrong :( Thanks.
      use warnings; #use strict; use HTML::TableExtract; use LWP::Simple; my @a = 0; my $item = 0; $/="\n\n"; open (FILE, "/path/to/file/file.htm") || print "Error"; @a = <FILE>; close (FILE); foreach (@a) { foreach $item (split "(/</TR>/)gi", $_,) { my $chunk = "$item"; #print "$chunk","\n","***********"; my $te = HTML::TableExtract->new(); $te->parse($chunk) ->first_table_found;; foreach my $ts ($te->tables) { my @rows = $te->rows; foreach my $row ($te->rows) {print join(',', @$row), "\n";} } }}
      Output
      Channel , Call Letters , Count , Percent , Title Channel , Call Letters , Count , Percent , Title Channel , Call Letters , Count , Percent , Title Channel , Call Letters , Count , Percent , Title Channel , Call Letters , Count , Percent , Title Channel , Call Letters , Count , Percent , Title Channel , Call Letters , Count , Percent , Title
      (loops throughout)....
        HTML::TableExtract is great if all you need is data in tables, but not so good when you also need data outside the tables. Then you're stuck with HTML::Parser. Sometimes I wish HTML::TableExtract had a place to plugin an additional callback to populate from data outside the table.
        You are overcomplicating it. Why all the opening and chunking?
        use warnings; use strict; use HTML::TableExtract; my $te = HTML::TableExtract->new(); $te->parse_file('935413.html'); foreach my $ts ($te->tables) { foreach my $row ($te->rows) { print join(',', @$row),"\n"; } }

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://935738]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others musing on the Monastery: (7)
As of 2014-12-22 03:54 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    Is guessing a good strategy for surviving in the IT business?





    Results (110 votes), past polls