Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl: the Markov chain saw
 
PerlMonks  

Re^2: HTML Table to MYSQL DB

by Fiddler (Initiate)
on Nov 02, 2011 at 18:16 UTC ( #935463=note: print w/ replies, xml ) Need Help??


in reply to Re: HTML Table to MYSQL DB
in thread HTML Table to MYSQL DB

Thank you so much. i will add this to my code.. Will post the results as they come


Comment on Re^2: HTML Table to MYSQL DB
Re^3: HTML Table to MYSQL DB
by Fiddler (Initiate) on Nov 03, 2011 at 15:27 UTC
    OK i got the first table in my html document to process. Now here's the tricky part at the head of each table there's a time stamp that i need to store as a fifth column. Example
    00:00 - 00:30
    Channel Call Letters Count Percent Title
    1   IETV   3   0.81%   IETV
    3   CNC3   1   0.27%   CNC3 Overnight
    4   TV 4   1   0.27%   TV 4
    5   CCNTV6   16   4.35%   For Your Eyes Only
    6   C   2   0.54%   C Overnight
    7   GAYEL   2   0.54%   Macajuel Time
    8   IBN   1   0.27%   Press TV
    9   ACTS   2   0.54%   ACTS - Family Education
    10   EWTN   2   0.54%   Daily Mass: Our Lady
    12   WIN   1   0.27%   WIN

    00:30 - 01:00
    Channel Call Letters Count Percent Title
    1   IETV   1   0.37%   IETV
    3   CNC3   3   1.12%   CNC3 Overnight
    4   TV 4   1   0.37%   TV 4
    5   CCNTV6   15   5.63%   For Your Eyes Only
    6   C   4   1.50%   C Overnight
    7   GAYEL   1   0.37%   Macajuel Time
    9   ACTS   4   1.50%   ACTS - Family Education
    12   WIN   1   0.37%   WIN
    14   WI   3   1.12%   Local Sports
    15   SYN   1   0.37%   Synergy Nights
    or **********************************
    <BR><B> 00:00 - 00:30 </B> <TABLE BORDER> <TR> <TD> Channel </TD> <TD> Call Letters </TD> <TD> Count </TD> <TD> Percent </TD> <TD> Title </TD> </TR> <TR></TR> <TR> <TD align=right> 1 </TD> <TD > &nbsp; IETV &nbsp; </TD> <TD align=right> 3 </TD> <TD > &nbsp; 0.81% &nbsp; </TD> <TD > IETV </TD> </TR> <TR> <TD align=right> 3 </TD> <TD > &nbsp; CNC3 &nbsp; </TD> <TD align=right> 1 </TD> <TD > &nbsp; 0.27% &nbsp; </TD> <TD > CNC3 Overnight </TD> </TR> <TR> <TD align=right> 4 </TD> <TD > &nbsp; TV 4 &nbsp; </TD> <TD align=right> 1 </TD> <TD > &nbsp; 0.27% &nbsp; </TD> <TD > TV 4 </TD> </TR> <TR> <TD align=right> 5 </TD> <TD > &nbsp; CCNTV6 &nbsp; </TD> <TD align=right> 16 </TD> <TD > &nbsp; 4.35% &nbsp; </TD> <TD > For Your Eyes Only </TD> </TR> <TR> <TD align=right> 6 </TD> <TD > &nbsp; C &nbsp; </TD> <TD align=right> 2 </TD> <TD > &nbsp; 0.54% &nbsp; </TD> <TD > C Overnight </TD> </TR> <TR> <TD align=right> 7 </TD> <TD > &nbsp; GAYEL &nbsp; </TD> <TD align=right> 2 </TD> <TD > &nbsp; 0.54% &nbsp; </TD> <TD > Macajuel Time </TD> </TR> <TR> <TD align=right> 8 </TD> <TD > &nbsp; IBN &nbsp; </TD> <TD align=right> 1 </TD> <TD > &nbsp; 0.27% &nbsp; </TD> <TD > Press TV </TD> </TR> <TR> <TD align=right> 9 </TD> <TD > &nbsp; ACTS &nbsp; </TD> <TD align=right> 2 </TD> <TD > &nbsp; 0.54% &nbsp; </TD> <TD > ACTS - Family Education </TD> </TR> <TR> <TD align=right> 10 </TD> <TD > &nbsp; EWTN &nbsp; </TD> <TD align=right> 2 </TD> <TD > &nbsp; 0.54% &nbsp; </TD> <TD > Daily Mass: Our Lady </TD> </TR> <TR> <TD align=right> 12 </TD> <TD > &nbsp; WIN &nbsp; </TD> <TD align=right> 1 </TD> <TD > &nbsp; 0.27% &nbsp; </TD> <TD > WIN </TD> </TR> </TABLE> <BR><B> 00:30 - 01:00 </B> <TABLE BORDER> <TR> <TD> Channel </TD> <TD> Call Letters </TD> <TD> Count </TD> <TD> Percent </TD> <TD> Title </TD> </TR> <TR></TR> <TR> <TD align=right> 1 </TD> <TD > &nbsp; IETV &nbsp; </TD> <TD align=right> 1 </TD> <TD > &nbsp; 0.37% &nbsp; </TD> <TD > IETV </TD> </TR> <TR> <TD align=right> 3 </TD> <TD > &nbsp; CNC3 &nbsp; </TD> <TD align=right> 3 </TD> <TD > &nbsp; 1.12% &nbsp; </TD> <TD > CNC3 Overnight </TD> </TR> <TR> <TD align=right> 4 </TD> <TD > &nbsp; TV 4 &nbsp; </TD> <TD align=right> 1 </TD> <TD > &nbsp; 0.37% &nbsp; </TD> <TD > TV 4 </TD> </TR> <TR> <TD align=right> 5 </TD> <TD > &nbsp; CCNTV6 &nbsp; </TD> <TD align=right> 15 </TD> <TD > &nbsp; 5.63% &nbsp; </TD> <TD > For Your Eyes Only </TD> </TR> <TR> <TD align=right> 6 </TD> <TD > &nbsp; C &nbsp; </TD> <TD align=right> 4 </TD> <TD > &nbsp; 1.50% &nbsp; </TD> <TD > C Overnight </TD> </TR> <TR> <TD align=right> 7 </TD> <TD > &nbsp; GAYEL &nbsp; </TD> <TD align=right> 1 </TD> <TD > &nbsp; 0.37% &nbsp; </TD> <TD > Macajuel Time </TD> </TR> <TR> <TD align=right> 9 </TD> <TD > &nbsp; ACTS &nbsp; </TD> <TD align=right> 4 </TD> <TD > &nbsp; 1.50% &nbsp; </TD> <TD > ACTS - Family Education </TD> </TR> <TR> <TD align=right> 12 </TD> <TD > &nbsp; WIN &nbsp; </TD> <TD align=right> 1 </TD> <TD > &nbsp; 0.37% &nbsp; </TD> <TD > WIN </TD> </TR> <TR> <TD align=right> 14 </TD> <TD > &nbsp; WI &nbsp; </TD> <TD align=right> 3 </TD> <TD > &nbsp; 1.12% &nbsp; </TD> <TD > Local Sports </TD> </TR> <TR> <TD align=right> 15 </TD> <TD > &nbsp; SYN &nbsp; </TD> <TD align=right> 1 </TD> <TD > &nbsp; 0.37% &nbsp; </TD> <TD > Synergy Nights </TD> </TR> </TABLE>
    SO in reality the columns in my DB would be Stamp, Channel, Call Letters, Count, Percent, Title Any help on this will be greatly appreciated.
      Below is where i am with the code. The problem is that it stops at the first table.... If i get it to loop through the tables in the htm doc, then writing the rows to the DB should be fairly simple.
      use warnings; use strict; use HTML::TableExtract; use LWP::Simple; my $file ="/path/to/file/file.htm"; my $T = HTML::TableExtract->new(); my $table = $T->parse_file($file) ->first_table_found; my @rows = $table->rows; foreach my $row ($T->rows) { print join(',', @$row), "\n"; }
        Instead of first_table_found, use a loop over $T->tables as shown in the documentation of HTML::TableExtract. To get the timestamp, you will probably need something more powerful, as HTML::Parser.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://935463]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others scrutinizing the Monastery: (8)
As of 2015-07-03 20:20 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    The top three priorities of my open tasks are (in descending order of likelihood to be worked on) ...









    Results (56 votes), past polls