Beefy Boxes and Bandwidth Generously Provided by pair Networks
more useful options
 
PerlMonks  

HTML Table to MYSQL DB

by Fiddler (Initiate)
on Nov 02, 2011 at 14:52 UTC ( #935413=perlquestion: print w/ replies, xml ) Need Help??
Fiddler has asked for the wisdom of the Perl Monks concerning the following question:

Monks, I'm in dire need of some direction. I'm trying to read some html tables and ingest them into an Sql DB using perl. I cant for the life of me understand regex and matching tags So let me give see if i can explain this better. A sample of the table looks like this (machine generated)
<BR><B> 00:00 - 00:15 </B> <TABLE BORDER> <TR> <TD> Channel </TD> <TD> Call Letters </TD> <TD> Count </TD> <TD> Percent </TD> <TD> Title </TD> </TR> <TR></TR> <TR> <TD align=right> 1 </TD> <TD > &nbsp; ch1 &nbsp; </TD> <TD align=right> 22 </TD> <TD > &nbsp; 6.52% &nbsp; </TD> <TD > Program 1 </TD> </TR> <TR> <TD align=right> 3 </TD> <TD > &nbsp; Ch1 &nbsp; </TD> <TD align=right> 5 </TD> <TD > &nbsp; 1.48% &nbsp; </TD> <TD > Program2 </TD> </TR> <TR> <TD align=right> 4 </TD> <TD > &nbsp; Ch 3 &nbsp; </TD> <TD align=right> 1 </TD> <TD > &nbsp; 0.29% &nbsp; </TD> <TD > Program3 </TD> </TR> <TR> <TD align=right> 5 </TD> <TD > &nbsp; CH53 &nbsp; </TD> <TD align=right> 5 </TD> <TD > &nbsp; 1.48% &nbsp; </TD> <TD > Program4 </TD> </TR> <TR> <TD align=right> 6 </TD> <TD > &nbsp; C &nbsp; </TD> <TD align=right> 3 </TD> <TD > &nbsp; 0.89% &nbsp; </TD> <TD > Program5 </TD> </TR> </TABLE>
It pretty much loops like this throughout, what i'm trying to do is convert this to a csv or something and ingest it into a DB. If i can directly write to the DB when looping through that'll be ever better. Let me know if you can help me. With any solution. P.S this is my script so far.
use warnings; use strict; use HTML::TableExtract; use LWP::Simple; my $file ="/path/to/file/file.htm"; my $T = HTML::TableExtract->new(); my $table = $T->parse_file($file) ->first_table_found; my @rows = $table->rows; foreach my $row ($T->rows) { print join(',', @$row), "\n"; }
It only processes the FIRST table tho... not the rest of the document.

Comment on HTML Table to MYSQL DB
Select or Download Code
Re: HTML Table to MYSQL DB
by moritz (Cardinal) on Nov 02, 2011 at 14:58 UTC
      Can you give me an example of how you'll do this?
Re: HTML Table to MYSQL DB
by choroba (Abbot) on Nov 02, 2011 at 15:01 UTC
      how exactly would you use that? I can install CPAN modules but I've never used them.
        Something like this:
        use warnings; use strict; use HTML::TableExtract; use LWP::Simple; my $T = HTML::TableExtract->new(); my $table = $T->parse(get( 'http://.......')) ->first_table_found; my @rows = $table->rows; for my $row (@rows) { # Process rows... }

        Fiddler:

        how exactly would you use that? I can install CPAN modules but I've never used them.

        I'd first install it. Then I'd read the documentation, and put the code in the synopsis into a file, say foo.pl, and tweak it a little based on the documentation and the HTML I was trying to parse. Next, I'd run it and see what happened.

        Once I get to the point where I can read the data, I'd write some code to write the data into the database.

        If it worked, then I'd be done. On the other hand, if it didn't work, then I'd look to see if there were any error messages. If so, I'd try to figure out what the error messages were trying to tell me. If I couldn't figure it out, then I'd post my code, my HTML and my question on Perlmonks to see if anyone could lend a hand.

        That's exactly what I'd do.

        ...roboticus

        When your only tool is a hammer, all problems look like your thumb.

Re: HTML Table to MYSQL DB
by zentara (Archbishop) on Nov 02, 2011 at 17:12 UTC
    Here is a headstart piece of code for you. Also see extract data using HTML::TableExtract
    #!/usr/bin/perl use strict; use warnings; use diagnostics; use HTML::TableExtract; my $table_1 = ' <table><tr><td>foo</td><td>bar</td></tr> <tr><td>baz</td><td>quux</td></tr></table>'; my $table_2 = ' <table><tr><td>bof</td><td>xyzzy</td></tr> <tr><td>bat</td><td>gazonk</td></tr></table>'; my $te = HTML::TableExtract->new(); $te->parse($table_1); foreach my $ts ($te->tables) { print "Table (", join(',', $ts->coords), "):\n"; foreach my $row ($ts->rows) { print join(',', @$row), "\n"; } } print "\n#######################\n"; ## what goes here if I want to dump table_1 ? ## it appends, not resetting $te->parse($table_1); foreach my $ts ($te->tables) { print "Table (", join(',', $ts->coords), "):\n"; foreach my $row ($ts->rows) { print join(',', @$row), "\n"; } }

    I'm not really a human, but I play one on earth.
    Old Perl Programmer Haiku ................... flash japh
      Thank you so much. i will add this to my code.. Will post the results as they come
        OK i got the first table in my html document to process. Now here's the tricky part at the head of each table there's a time stamp that i need to store as a fifth column. Example
        00:00 - 00:30
        Channel Call Letters Count Percent Title
        1   IETV   3   0.81%   IETV
        3   CNC3   1   0.27%   CNC3 Overnight
        4   TV 4   1   0.27%   TV 4
        5   CCNTV6   16   4.35%   For Your Eyes Only
        6   C   2   0.54%   C Overnight
        7   GAYEL   2   0.54%   Macajuel Time
        8   IBN   1   0.27%   Press TV
        9   ACTS   2   0.54%   ACTS - Family Education
        10   EWTN   2   0.54%   Daily Mass: Our Lady
        12   WIN   1   0.27%   WIN

        00:30 - 01:00
        Channel Call Letters Count Percent Title
        1   IETV   1   0.37%   IETV
        3   CNC3   3   1.12%   CNC3 Overnight
        4   TV 4   1   0.37%   TV 4
        5   CCNTV6   15   5.63%   For Your Eyes Only
        6   C   4   1.50%   C Overnight
        7   GAYEL   1   0.37%   Macajuel Time
        9   ACTS   4   1.50%   ACTS - Family Education
        12   WIN   1   0.37%   WIN
        14   WI   3   1.12%   Local Sports
        15   SYN   1   0.37%   Synergy Nights
        or **********************************
        <BR><B> 00:00 - 00:30 </B> <TABLE BORDER> <TR> <TD> Channel </TD> <TD> Call Letters </TD> <TD> Count </TD> <TD> Percent </TD> <TD> Title </TD> </TR> <TR></TR> <TR> <TD align=right> 1 </TD> <TD > &nbsp; IETV &nbsp; </TD> <TD align=right> 3 </TD> <TD > &nbsp; 0.81% &nbsp; </TD> <TD > IETV </TD> </TR> <TR> <TD align=right> 3 </TD> <TD > &nbsp; CNC3 &nbsp; </TD> <TD align=right> 1 </TD> <TD > &nbsp; 0.27% &nbsp; </TD> <TD > CNC3 Overnight </TD> </TR> <TR> <TD align=right> 4 </TD> <TD > &nbsp; TV 4 &nbsp; </TD> <TD align=right> 1 </TD> <TD > &nbsp; 0.27% &nbsp; </TD> <TD > TV 4 </TD> </TR> <TR> <TD align=right> 5 </TD> <TD > &nbsp; CCNTV6 &nbsp; </TD> <TD align=right> 16 </TD> <TD > &nbsp; 4.35% &nbsp; </TD> <TD > For Your Eyes Only </TD> </TR> <TR> <TD align=right> 6 </TD> <TD > &nbsp; C &nbsp; </TD> <TD align=right> 2 </TD> <TD > &nbsp; 0.54% &nbsp; </TD> <TD > C Overnight </TD> </TR> <TR> <TD align=right> 7 </TD> <TD > &nbsp; GAYEL &nbsp; </TD> <TD align=right> 2 </TD> <TD > &nbsp; 0.54% &nbsp; </TD> <TD > Macajuel Time </TD> </TR> <TR> <TD align=right> 8 </TD> <TD > &nbsp; IBN &nbsp; </TD> <TD align=right> 1 </TD> <TD > &nbsp; 0.27% &nbsp; </TD> <TD > Press TV </TD> </TR> <TR> <TD align=right> 9 </TD> <TD > &nbsp; ACTS &nbsp; </TD> <TD align=right> 2 </TD> <TD > &nbsp; 0.54% &nbsp; </TD> <TD > ACTS - Family Education </TD> </TR> <TR> <TD align=right> 10 </TD> <TD > &nbsp; EWTN &nbsp; </TD> <TD align=right> 2 </TD> <TD > &nbsp; 0.54% &nbsp; </TD> <TD > Daily Mass: Our Lady </TD> </TR> <TR> <TD align=right> 12 </TD> <TD > &nbsp; WIN &nbsp; </TD> <TD align=right> 1 </TD> <TD > &nbsp; 0.27% &nbsp; </TD> <TD > WIN </TD> </TR> </TABLE> <BR><B> 00:30 - 01:00 </B> <TABLE BORDER> <TR> <TD> Channel </TD> <TD> Call Letters </TD> <TD> Count </TD> <TD> Percent </TD> <TD> Title </TD> </TR> <TR></TR> <TR> <TD align=right> 1 </TD> <TD > &nbsp; IETV &nbsp; </TD> <TD align=right> 1 </TD> <TD > &nbsp; 0.37% &nbsp; </TD> <TD > IETV </TD> </TR> <TR> <TD align=right> 3 </TD> <TD > &nbsp; CNC3 &nbsp; </TD> <TD align=right> 3 </TD> <TD > &nbsp; 1.12% &nbsp; </TD> <TD > CNC3 Overnight </TD> </TR> <TR> <TD align=right> 4 </TD> <TD > &nbsp; TV 4 &nbsp; </TD> <TD align=right> 1 </TD> <TD > &nbsp; 0.37% &nbsp; </TD> <TD > TV 4 </TD> </TR> <TR> <TD align=right> 5 </TD> <TD > &nbsp; CCNTV6 &nbsp; </TD> <TD align=right> 15 </TD> <TD > &nbsp; 5.63% &nbsp; </TD> <TD > For Your Eyes Only </TD> </TR> <TR> <TD align=right> 6 </TD> <TD > &nbsp; C &nbsp; </TD> <TD align=right> 4 </TD> <TD > &nbsp; 1.50% &nbsp; </TD> <TD > C Overnight </TD> </TR> <TR> <TD align=right> 7 </TD> <TD > &nbsp; GAYEL &nbsp; </TD> <TD align=right> 1 </TD> <TD > &nbsp; 0.37% &nbsp; </TD> <TD > Macajuel Time </TD> </TR> <TR> <TD align=right> 9 </TD> <TD > &nbsp; ACTS &nbsp; </TD> <TD align=right> 4 </TD> <TD > &nbsp; 1.50% &nbsp; </TD> <TD > ACTS - Family Education </TD> </TR> <TR> <TD align=right> 12 </TD> <TD > &nbsp; WIN &nbsp; </TD> <TD align=right> 1 </TD> <TD > &nbsp; 0.37% &nbsp; </TD> <TD > WIN </TD> </TR> <TR> <TD align=right> 14 </TD> <TD > &nbsp; WI &nbsp; </TD> <TD align=right> 3 </TD> <TD > &nbsp; 1.12% &nbsp; </TD> <TD > Local Sports </TD> </TR> <TR> <TD align=right> 15 </TD> <TD > &nbsp; SYN &nbsp; </TD> <TD align=right> 1 </TD> <TD > &nbsp; 0.37% &nbsp; </TD> <TD > Synergy Nights </TD> </TR> </TABLE>
        SO in reality the columns in my DB would be Stamp, Channel, Call Letters, Count, Percent, Title Any help on this will be greatly appreciated.
Re: HTML Table to MYSQL DB
by Anonymous Monk on Nov 10, 2011 at 10:10 UTC

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://935413]
Approved by moritz
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others romping around the Monastery: (7)
As of 2014-08-30 03:58 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    The best computer themed movie is:











    Results (291 votes), past polls