http://www.perlmonks.org?node_id=1052236

bkerr has asked for the wisdom of the Perl Monks concerning the following question:

Hi there, I am trying to extract a table from a html file based on the table class. I dont seem to be getting any output at all. Can someone help me??? WOuld really appreciate it!
#!/usr/bin/perl -w use CGI::Carp qw(fatalsToBrowser); use DBI; use IO::Socket; use lib '.'; use ATweb qw(%form $db $dbh %config ); use strict; local %form = &ATweb::fetch_form; my $db = &ATweb::mysql_connect; my $maintable; my $html_file = $config{'static_dir'}."/admin/documents/ghs_addresse.t +xt"; use HTML::TableExtract; use Data::Dumper; my $doc_html; local $/=undef; open FILE, "$html_file" or die "Couldn't open file: $html_file $!"; binmode FILE; $doc_html = <FILE>; close FILE; my $te = HTML::TableExtract->new( attribs => { class => 'generalinfo' }, ); $te->parse_file("$doc_html"); print "Content-Type: text/html\n\n"; for my $table ( $te->tables ) { print Dump $table->columns; }
Heres the relevant contents of the html file:
<table cellspacing="0" cellpadding="0" class="generalin +fo"> <tr ><th class="rtl"></th><th colspan="2">Vermietu +ng</th><th class="rtr"></th></tr> <!-- header --> <tr><td class="left">&nbsp;</td><td>&nbsp;</td><td +></td><td class="right">&nbsp;</td></tr> <!-- spacer --> <tr> <td class="left">&nbsp;</td> <td class="text"><strong>Name</strong></td> <td class="text">Kaspar Flütsch</td> <td class="right">&nbsp;</td> </tr> <tr> <td class="left">&nbsp;</td> <td class="text"><strong>Adresse</strong></td> <td class="text">Schreinerei<br>7246 St. Antönien</td> <td class="right">&nbsp;</td> </tr> <tr> <td class="left">&nbsp;</td> <td class="text"><strong>Tel. Gesch&auml;ft</strong></td> <td class="text">081 332 23 31</td> <td class="right">&nbsp;</td> </tr> <tr> <td class="left">&nbsp;</td> <td class="text"><strong>Tel. Privat</strong></td> <td class="text">081 332 23 31</td> <td class="right">&nbsp;</td> </tr> <tr><td class="rbl"></td><td class="rb">&nbsp;</td>< +td class="rb">&nbsp;</td><td class="rbr"></td></tr> <!-- footer --> </table>

Replies are listed 'Best First'.
Re: HTML::TableExtract problem
by hdb (Monsignor) on Sep 04, 2013 at 07:53 UTC

    In $te->parse_file("$doc_html") the argument is supposed to be the filename of the html file. Use $te->parse($doc_html) to parse a string.

      yes, In the mean time i found that silly mistake myself!! appreciate the quick answer! Thanks!