Beefy Boxes and Bandwidth Generously Provided by pair Networks
Your skill will accomplish
what the force of many cannot
 
PerlMonks  

Re: HTML Table to MYSQL DB

by zentara (Archbishop)
on Nov 02, 2011 at 17:12 UTC ( #935445=note: print w/ replies, xml ) Need Help??


in reply to HTML Table to MYSQL DB

Here is a headstart piece of code for you. Also see extract data using HTML::TableExtract

#!/usr/bin/perl use strict; use warnings; use diagnostics; use HTML::TableExtract; my $table_1 = ' <table><tr><td>foo</td><td>bar</td></tr> <tr><td>baz</td><td>quux</td></tr></table>'; my $table_2 = ' <table><tr><td>bof</td><td>xyzzy</td></tr> <tr><td>bat</td><td>gazonk</td></tr></table>'; my $te = HTML::TableExtract->new(); $te->parse($table_1); foreach my $ts ($te->tables) { print "Table (", join(',', $ts->coords), "):\n"; foreach my $row ($ts->rows) { print join(',', @$row), "\n"; } } print "\n#######################\n"; ## what goes here if I want to dump table_1 ? ## it appends, not resetting $te->parse($table_1); foreach my $ts ($te->tables) { print "Table (", join(',', $ts->coords), "):\n"; foreach my $row ($ts->rows) { print join(',', @$row), "\n"; } }

I'm not really a human, but I play one on earth.
Old Perl Programmer Haiku ................... flash japh


Comment on Re: HTML Table to MYSQL DB
Download Code
Replies are listed 'Best First'.
Re^2: HTML Table to MYSQL DB
by Fiddler (Initiate) on Nov 02, 2011 at 18:16 UTC
    Thank you so much. i will add this to my code.. Will post the results as they come
      OK i got the first table in my html document to process. Now here's the tricky part at the head of each table there's a time stamp that i need to store as a fifth column. Example
      00:00 - 00:30
      Channel Call Letters Count Percent Title
      1   IETV   3   0.81%   IETV
      3   CNC3   1   0.27%   CNC3 Overnight
      4   TV 4   1   0.27%   TV 4
      5   CCNTV6   16   4.35%   For Your Eyes Only
      6   C   2   0.54%   C Overnight
      7   GAYEL   2   0.54%   Macajuel Time
      8   IBN   1   0.27%   Press TV
      9   ACTS   2   0.54%   ACTS - Family Education
      10   EWTN   2   0.54%   Daily Mass: Our Lady
      12   WIN   1   0.27%   WIN

      00:30 - 01:00
      Channel Call Letters Count Percent Title
      1   IETV   1   0.37%   IETV
      3   CNC3   3   1.12%   CNC3 Overnight
      4   TV 4   1   0.37%   TV 4
      5   CCNTV6   15   5.63%   For Your Eyes Only
      6   C   4   1.50%   C Overnight
      7   GAYEL   1   0.37%   Macajuel Time
      9   ACTS   4   1.50%   ACTS - Family Education
      12   WIN   1   0.37%   WIN
      14   WI   3   1.12%   Local Sports
      15   SYN   1   0.37%   Synergy Nights
      or **********************************
      <BR><B> 00:00 - 00:30 </B> <TABLE BORDER> <TR> <TD> Channel </TD> <TD> Call Letters </TD> <TD> Count </TD> <TD> Percent </TD> <TD> Title </TD> </TR> <TR></TR> <TR> <TD align=right> 1 </TD> <TD > &nbsp; IETV &nbsp; </TD> <TD align=right> 3 </TD> <TD > &nbsp; 0.81% &nbsp; </TD> <TD > IETV </TD> </TR> <TR> <TD align=right> 3 </TD> <TD > &nbsp; CNC3 &nbsp; </TD> <TD align=right> 1 </TD> <TD > &nbsp; 0.27% &nbsp; </TD> <TD > CNC3 Overnight </TD> </TR> <TR> <TD align=right> 4 </TD> <TD > &nbsp; TV 4 &nbsp; </TD> <TD align=right> 1 </TD> <TD > &nbsp; 0.27% &nbsp; </TD> <TD > TV 4 </TD> </TR> <TR> <TD align=right> 5 </TD> <TD > &nbsp; CCNTV6 &nbsp; </TD> <TD align=right> 16 </TD> <TD > &nbsp; 4.35% &nbsp; </TD> <TD > For Your Eyes Only </TD> </TR> <TR> <TD align=right> 6 </TD> <TD > &nbsp; C &nbsp; </TD> <TD align=right> 2 </TD> <TD > &nbsp; 0.54% &nbsp; </TD> <TD > C Overnight </TD> </TR> <TR> <TD align=right> 7 </TD> <TD > &nbsp; GAYEL &nbsp; </TD> <TD align=right> 2 </TD> <TD > &nbsp; 0.54% &nbsp; </TD> <TD > Macajuel Time </TD> </TR> <TR> <TD align=right> 8 </TD> <TD > &nbsp; IBN &nbsp; </TD> <TD align=right> 1 </TD> <TD > &nbsp; 0.27% &nbsp; </TD> <TD > Press TV </TD> </TR> <TR> <TD align=right> 9 </TD> <TD > &nbsp; ACTS &nbsp; </TD> <TD align=right> 2 </TD> <TD > &nbsp; 0.54% &nbsp; </TD> <TD > ACTS - Family Education </TD> </TR> <TR> <TD align=right> 10 </TD> <TD > &nbsp; EWTN &nbsp; </TD> <TD align=right> 2 </TD> <TD > &nbsp; 0.54% &nbsp; </TD> <TD > Daily Mass: Our Lady </TD> </TR> <TR> <TD align=right> 12 </TD> <TD > &nbsp; WIN &nbsp; </TD> <TD align=right> 1 </TD> <TD > &nbsp; 0.27% &nbsp; </TD> <TD > WIN </TD> </TR> </TABLE> <BR><B> 00:30 - 01:00 </B> <TABLE BORDER> <TR> <TD> Channel </TD> <TD> Call Letters </TD> <TD> Count </TD> <TD> Percent </TD> <TD> Title </TD> </TR> <TR></TR> <TR> <TD align=right> 1 </TD> <TD > &nbsp; IETV &nbsp; </TD> <TD align=right> 1 </TD> <TD > &nbsp; 0.37% &nbsp; </TD> <TD > IETV </TD> </TR> <TR> <TD align=right> 3 </TD> <TD > &nbsp; CNC3 &nbsp; </TD> <TD align=right> 3 </TD> <TD > &nbsp; 1.12% &nbsp; </TD> <TD > CNC3 Overnight </TD> </TR> <TR> <TD align=right> 4 </TD> <TD > &nbsp; TV 4 &nbsp; </TD> <TD align=right> 1 </TD> <TD > &nbsp; 0.37% &nbsp; </TD> <TD > TV 4 </TD> </TR> <TR> <TD align=right> 5 </TD> <TD > &nbsp; CCNTV6 &nbsp; </TD> <TD align=right> 15 </TD> <TD > &nbsp; 5.63% &nbsp; </TD> <TD > For Your Eyes Only </TD> </TR> <TR> <TD align=right> 6 </TD> <TD > &nbsp; C &nbsp; </TD> <TD align=right> 4 </TD> <TD > &nbsp; 1.50% &nbsp; </TD> <TD > C Overnight </TD> </TR> <TR> <TD align=right> 7 </TD> <TD > &nbsp; GAYEL &nbsp; </TD> <TD align=right> 1 </TD> <TD > &nbsp; 0.37% &nbsp; </TD> <TD > Macajuel Time </TD> </TR> <TR> <TD align=right> 9 </TD> <TD > &nbsp; ACTS &nbsp; </TD> <TD align=right> 4 </TD> <TD > &nbsp; 1.50% &nbsp; </TD> <TD > ACTS - Family Education </TD> </TR> <TR> <TD align=right> 12 </TD> <TD > &nbsp; WIN &nbsp; </TD> <TD align=right> 1 </TD> <TD > &nbsp; 0.37% &nbsp; </TD> <TD > WIN </TD> </TR> <TR> <TD align=right> 14 </TD> <TD > &nbsp; WI &nbsp; </TD> <TD align=right> 3 </TD> <TD > &nbsp; 1.12% &nbsp; </TD> <TD > Local Sports </TD> </TR> <TR> <TD align=right> 15 </TD> <TD > &nbsp; SYN &nbsp; </TD> <TD align=right> 1 </TD> <TD > &nbsp; 0.37% &nbsp; </TD> <TD > Synergy Nights </TD> </TR> </TABLE>
      SO in reality the columns in my DB would be Stamp, Channel, Call Letters, Count, Percent, Title Any help on this will be greatly appreciated.
        Below is where i am with the code. The problem is that it stops at the first table.... If i get it to loop through the tables in the htm doc, then writing the rows to the DB should be fairly simple.
        use warnings; use strict; use HTML::TableExtract; use LWP::Simple; my $file ="/path/to/file/file.htm"; my $T = HTML::TableExtract->new(); my $table = $T->parse_file($file) ->first_table_found; my @rows = $table->rows; foreach my $row ($T->rows) { print join(',', @$row), "\n"; }

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://935445]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others meditating upon the Monastery: (13)
As of 2015-07-07 19:46 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    The top three priorities of my open tasks are (in descending order of likelihood to be worked on) ...









    Results (93 votes), past polls