Beefy Boxes and Bandwidth Generously Provided by pair Networks
"be consistent"

Re^5: HTML Table to MYSQL DB

by choroba (Bishop)
on Nov 03, 2011 at 19:03 UTC ( #935751=note: print w/replies, xml ) Need Help??

in reply to Re^4: HTML Table to MYSQL DB
in thread HTML Table to MYSQL DB

Instead of first_table_found, use a loop over $T->tables as shown in the documentation of HTML::TableExtract. To get the timestamp, you will probably need something more powerful, as HTML::Parser.

Replies are listed 'Best First'.
Re^6: HTML Table to MYSQL DB
by Fiddler (Initiate) on Nov 09, 2011 at 15:53 UTC
    Hey Monks, i'm so close i can taste it... I have decided to cut the HTML tables up into chunks so that i can work with them easier on each loop. The problem now is that i'm getting the same variables every time the program goes to the next chunk of data... See my code below.. i'll appreciate it if you can tell me where exactly i went wrong :( Thanks.
    use warnings; #use strict; use HTML::TableExtract; use LWP::Simple; my @a = 0; my $item = 0; $/="\n\n"; open (FILE, "/path/to/file/file.htm") || print "Error"; @a = <FILE>; close (FILE); foreach (@a) { foreach $item (split "(/</TR>/)gi", $_,) { my $chunk = "$item"; #print "$chunk","\n","***********"; my $te = HTML::TableExtract->new(); $te->parse($chunk) ->first_table_found;; foreach my $ts ($te->tables) { my @rows = $te->rows; foreach my $row ($te->rows) {print join(',', @$row), "\n";} } }}
    Channel , Call Letters , Count , Percent , Title Channel , Call Letters , Count , Percent , Title Channel , Call Letters , Count , Percent , Title Channel , Call Letters , Count , Percent , Title Channel , Call Letters , Count , Percent , Title Channel , Call Letters , Count , Percent , Title Channel , Call Letters , Count , Percent , Title
    (loops throughout)....
      HTML::TableExtract is great if all you need is data in tables, but not so good when you also need data outside the tables. Then you're stuck with HTML::Parser. Sometimes I wish HTML::TableExtract had a place to plugin an additional callback to populate from data outside the table.
      You are overcomplicating it. Why all the opening and chunking?
      use warnings; use strict; use HTML::TableExtract; my $te = HTML::TableExtract->new(); $te->parse_file('935413.html'); foreach my $ts ($te->tables) { foreach my $row ($te->rows) { print join(',', @$row),"\n"; } }
        I know chroba but, You see i need to create a time-stamp column with some data outside of the html tables. so with each chunk of data, that variable will change depending on the stamp... ( see original table to get an idea of what i'm saying). Chunks are the only thing i can think about to get that info.

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://935751]
[marto]: just a security event, they have people from the 'corporate' security, external security companies, the police etc
[marto]: perhaps I'll get the time to pin down some of the corporate security people, since they refuse to answer any questions
[marto]: for our system, the windows pre production domains alone require 490 accounts :/
[marto]: they don't have a password management solution, and we're not supposed to write any of this down
Discipulus boring meeting time.. ;=(
[choroba]: marto do you really remember 490 different passwords?! :-o

How do I use this? | Other CB clients
Other Users?
Others browsing the Monastery: (8)
As of 2017-11-22 10:27 GMT
Find Nodes?
    Voting Booth?
    In order to be able to say "I know Perl", you must have:

    Results (317 votes). Check out past polls.