Beefy Boxes and Bandwidth Generously Provided by pair Networks
P is for Practical
 
PerlMonks  

Re^5: HTML Table to MYSQL DB

by choroba (Abbot)
on Nov 03, 2011 at 19:03 UTC ( #935751=note: print w/ replies, xml ) Need Help??


in reply to Re^4: HTML Table to MYSQL DB
in thread HTML Table to MYSQL DB

Instead of first_table_found, use a loop over $T->tables as shown in the documentation of HTML::TableExtract. To get the timestamp, you will probably need something more powerful, as HTML::Parser.


Comment on Re^5: HTML Table to MYSQL DB
Select or Download Code
Re^6: HTML Table to MYSQL DB
by Fiddler (Initiate) on Nov 09, 2011 at 15:53 UTC
    Hey Monks, i'm so close i can taste it... I have decided to cut the HTML tables up into chunks so that i can work with them easier on each loop. The problem now is that i'm getting the same variables every time the program goes to the next chunk of data... See my code below.. i'll appreciate it if you can tell me where exactly i went wrong :( Thanks.
    use warnings; #use strict; use HTML::TableExtract; use LWP::Simple; my @a = 0; my $item = 0; $/="\n\n"; open (FILE, "/path/to/file/file.htm") || print "Error"; @a = <FILE>; close (FILE); foreach (@a) { foreach $item (split "(/</TR>/)gi", $_,) { my $chunk = "$item"; #print "$chunk","\n","***********"; my $te = HTML::TableExtract->new(); $te->parse($chunk) ->first_table_found;; foreach my $ts ($te->tables) { my @rows = $te->rows; foreach my $row ($te->rows) {print join(',', @$row), "\n";} } }}
    Output
    Channel , Call Letters , Count , Percent , Title Channel , Call Letters , Count , Percent , Title Channel , Call Letters , Count , Percent , Title Channel , Call Letters , Count , Percent , Title Channel , Call Letters , Count , Percent , Title Channel , Call Letters , Count , Percent , Title Channel , Call Letters , Count , Percent , Title
    (loops throughout)....
      HTML::TableExtract is great if all you need is data in tables, but not so good when you also need data outside the tables. Then you're stuck with HTML::Parser. Sometimes I wish HTML::TableExtract had a place to plugin an additional callback to populate from data outside the table.
      You are overcomplicating it. Why all the opening and chunking?
      use warnings; use strict; use HTML::TableExtract; my $te = HTML::TableExtract->new(); $te->parse_file('935413.html'); foreach my $ts ($te->tables) { foreach my $row ($te->rows) { print join(',', @$row),"\n"; } }
        I know chroba but, You see i need to create a time-stamp column with some data outside of the html tables. so with each chunk of data, that variable will change depending on the stamp... ( see original table to get an idea of what i'm saying). Chunks are the only thing i can think about to get that info.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://935751]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others examining the Monastery: (7)
As of 2014-09-19 22:47 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    How do you remember the number of days in each month?











    Results (151 votes), past polls