Beefy Boxes and Bandwidth Generously Provided by pair Networks
XP is just a number
 
PerlMonks  

Re^3: HTML Table to MYSQL DB

by Fiddler (Initiate)
on Nov 03, 2011 at 15:27 UTC ( #935687=note: print w/ replies, xml ) Need Help??


in reply to Re^2: HTML Table to MYSQL DB
in thread HTML Table to MYSQL DB

OK i got the first table in my html document to process. Now here's the tricky part at the head of each table there's a time stamp that i need to store as a fifth column. Example
00:00 - 00:30
Channel Call Letters Count Percent Title
1   IETV   3   0.81%   IETV
3   CNC3   1   0.27%   CNC3 Overnight
4   TV 4   1   0.27%   TV 4
5   CCNTV6   16   4.35%   For Your Eyes Only
6   C   2   0.54%   C Overnight
7   GAYEL   2   0.54%   Macajuel Time
8   IBN   1   0.27%   Press TV
9   ACTS   2   0.54%   ACTS - Family Education
10   EWTN   2   0.54%   Daily Mass: Our Lady
12   WIN   1   0.27%   WIN

00:30 - 01:00
Channel Call Letters Count Percent Title
1   IETV   1   0.37%   IETV
3   CNC3   3   1.12%   CNC3 Overnight
4   TV 4   1   0.37%   TV 4
5   CCNTV6   15   5.63%   For Your Eyes Only
6   C   4   1.50%   C Overnight
7   GAYEL   1   0.37%   Macajuel Time
9   ACTS   4   1.50%   ACTS - Family Education
12   WIN   1   0.37%   WIN
14   WI   3   1.12%   Local Sports
15   SYN   1   0.37%   Synergy Nights
or **********************************

<BR><B> 00:00 - 00:30 </B> <TABLE BORDER> <TR> <TD> Channel </TD> <TD> Call Letters </TD> <TD> Count </TD> <TD> Percent </TD> <TD> Title </TD> </TR> <TR></TR> <TR> <TD align=right> 1 </TD> <TD > &nbsp; IETV &nbsp; </TD> <TD align=right> 3 </TD> <TD > &nbsp; 0.81% &nbsp; </TD> <TD > IETV </TD> </TR> <TR> <TD align=right> 3 </TD> <TD > &nbsp; CNC3 &nbsp; </TD> <TD align=right> 1 </TD> <TD > &nbsp; 0.27% &nbsp; </TD> <TD > CNC3 Overnight </TD> </TR> <TR> <TD align=right> 4 </TD> <TD > &nbsp; TV 4 &nbsp; </TD> <TD align=right> 1 </TD> <TD > &nbsp; 0.27% &nbsp; </TD> <TD > TV 4 </TD> </TR> <TR> <TD align=right> 5 </TD> <TD > &nbsp; CCNTV6 &nbsp; </TD> <TD align=right> 16 </TD> <TD > &nbsp; 4.35% &nbsp; </TD> <TD > For Your Eyes Only </TD> </TR> <TR> <TD align=right> 6 </TD> <TD > &nbsp; C &nbsp; </TD> <TD align=right> 2 </TD> <TD > &nbsp; 0.54% &nbsp; </TD> <TD > C Overnight </TD> </TR> <TR> <TD align=right> 7 </TD> <TD > &nbsp; GAYEL &nbsp; </TD> <TD align=right> 2 </TD> <TD > &nbsp; 0.54% &nbsp; </TD> <TD > Macajuel Time </TD> </TR> <TR> <TD align=right> 8 </TD> <TD > &nbsp; IBN &nbsp; </TD> <TD align=right> 1 </TD> <TD > &nbsp; 0.27% &nbsp; </TD> <TD > Press TV </TD> </TR> <TR> <TD align=right> 9 </TD> <TD > &nbsp; ACTS &nbsp; </TD> <TD align=right> 2 </TD> <TD > &nbsp; 0.54% &nbsp; </TD> <TD > ACTS - Family Education </TD> </TR> <TR> <TD align=right> 10 </TD> <TD > &nbsp; EWTN &nbsp; </TD> <TD align=right> 2 </TD> <TD > &nbsp; 0.54% &nbsp; </TD> <TD > Daily Mass: Our Lady </TD> </TR> <TR> <TD align=right> 12 </TD> <TD > &nbsp; WIN &nbsp; </TD> <TD align=right> 1 </TD> <TD > &nbsp; 0.27% &nbsp; </TD> <TD > WIN </TD> </TR> </TABLE> <BR><B> 00:30 - 01:00 </B> <TABLE BORDER> <TR> <TD> Channel </TD> <TD> Call Letters </TD> <TD> Count </TD> <TD> Percent </TD> <TD> Title </TD> </TR> <TR></TR> <TR> <TD align=right> 1 </TD> <TD > &nbsp; IETV &nbsp; </TD> <TD align=right> 1 </TD> <TD > &nbsp; 0.37% &nbsp; </TD> <TD > IETV </TD> </TR> <TR> <TD align=right> 3 </TD> <TD > &nbsp; CNC3 &nbsp; </TD> <TD align=right> 3 </TD> <TD > &nbsp; 1.12% &nbsp; </TD> <TD > CNC3 Overnight </TD> </TR> <TR> <TD align=right> 4 </TD> <TD > &nbsp; TV 4 &nbsp; </TD> <TD align=right> 1 </TD> <TD > &nbsp; 0.37% &nbsp; </TD> <TD > TV 4 </TD> </TR> <TR> <TD align=right> 5 </TD> <TD > &nbsp; CCNTV6 &nbsp; </TD> <TD align=right> 15 </TD> <TD > &nbsp; 5.63% &nbsp; </TD> <TD > For Your Eyes Only </TD> </TR> <TR> <TD align=right> 6 </TD> <TD > &nbsp; C &nbsp; </TD> <TD align=right> 4 </TD> <TD > &nbsp; 1.50% &nbsp; </TD> <TD > C Overnight </TD> </TR> <TR> <TD align=right> 7 </TD> <TD > &nbsp; GAYEL &nbsp; </TD> <TD align=right> 1 </TD> <TD > &nbsp; 0.37% &nbsp; </TD> <TD > Macajuel Time </TD> </TR> <TR> <TD align=right> 9 </TD> <TD > &nbsp; ACTS &nbsp; </TD> <TD align=right> 4 </TD> <TD > &nbsp; 1.50% &nbsp; </TD> <TD > ACTS - Family Education </TD> </TR> <TR> <TD align=right> 12 </TD> <TD > &nbsp; WIN &nbsp; </TD> <TD align=right> 1 </TD> <TD > &nbsp; 0.37% &nbsp; </TD> <TD > WIN </TD> </TR> <TR> <TD align=right> 14 </TD> <TD > &nbsp; WI &nbsp; </TD> <TD align=right> 3 </TD> <TD > &nbsp; 1.12% &nbsp; </TD> <TD > Local Sports </TD> </TR> <TR> <TD align=right> 15 </TD> <TD > &nbsp; SYN &nbsp; </TD> <TD align=right> 1 </TD> <TD > &nbsp; 0.37% &nbsp; </TD> <TD > Synergy Nights </TD> </TR> </TABLE>
SO in reality the columns in my DB would be Stamp, Channel, Call Letters, Count, Percent, Title Any help on this will be greatly appreciated.


Comment on Re^3: HTML Table to MYSQL DB
Download Code
Re^4: HTML Table to MYSQL DB
by Fiddler (Initiate) on Nov 03, 2011 at 18:26 UTC
    Below is where i am with the code. The problem is that it stops at the first table.... If i get it to loop through the tables in the htm doc, then writing the rows to the DB should be fairly simple.
    use warnings; use strict; use HTML::TableExtract; use LWP::Simple; my $file ="/path/to/file/file.htm"; my $T = HTML::TableExtract->new(); my $table = $T->parse_file($file) ->first_table_found; my @rows = $table->rows; foreach my $row ($T->rows) { print join(',', @$row), "\n"; }
      Instead of first_table_found, use a loop over $T->tables as shown in the documentation of HTML::TableExtract. To get the timestamp, you will probably need something more powerful, as HTML::Parser.
        Hey Monks, i'm so close i can taste it... I have decided to cut the HTML tables up into chunks so that i can work with them easier on each loop. The problem now is that i'm getting the same variables every time the program goes to the next chunk of data... See my code below.. i'll appreciate it if you can tell me where exactly i went wrong :( Thanks.
        use warnings; #use strict; use HTML::TableExtract; use LWP::Simple; my @a = 0; my $item = 0; $/="\n\n"; open (FILE, "/path/to/file/file.htm") || print "Error"; @a = <FILE>; close (FILE); foreach (@a) { foreach $item (split "(/</TR>/)gi", $_,) { my $chunk = "$item"; #print "$chunk","\n","***********"; my $te = HTML::TableExtract->new(); $te->parse($chunk) ->first_table_found;; foreach my $ts ($te->tables) { my @rows = $te->rows; foreach my $row ($te->rows) {print join(',', @$row), "\n";} } }}
        Output
        Channel , Call Letters , Count , Percent , Title Channel , Call Letters , Count , Percent , Title Channel , Call Letters , Count , Percent , Title Channel , Call Letters , Count , Percent , Title Channel , Call Letters , Count , Percent , Title Channel , Call Letters , Count , Percent , Title Channel , Call Letters , Count , Percent , Title
        (loops throughout)....

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://935687]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others exploiting the Monastery: (9)
As of 2014-08-29 00:50 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    The best computer themed movie is:











    Results (275 votes), past polls