Re: Perl, HTML::TableExtract

I ran into some problems when I used your html. I changed to XHTML:

<?xml version="1.0" encoding="utf-8"?>
<html xmlns="http://www.w3.org/1999/xhtml">
  <head/>
  <body>
    <table border="0">
      <tbody>
        <tr valign="top">
          <td>
            <table cellpadding="2" id="RefSNP" width="350">
              <tbody>
                <tr>
                  <th align="center" bgcolor="#ccccff" class="text10" 
+colspan="2">RefSNP</th>
                </tr>
                <tr>
                  <td align="right" bgcolor="#f1f1f1" class="text10">
                    <strong>Organism:</strong>
                  </td>
                  <td bgcolor="#f1f1f1" class="text10">human (<a href=
+"http://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?mode=Info&am
+p;id=9606"><em>Homo sapiens</em></a>)</td>
                </tr>
              </tbody>
            </table>
          </td>
        </tr>
      </tbody>
    </table>
  </body>
</html>
[download]

In addition to the advice that hdb and NetWallah gave you, I think that you want to set keep_html to 0:

#!/usr/bin/perl

use strict;
use warnings;
use HTML::TableExtract;

my $file = '/root/Desktop/html.htm';

my $te = 'HTML::TableExtract'->new(
    keep_html => 0,
    attribs   => { id => 'RefSNP' },
);
$te->parse_file($file);

my $document = do {
    local $/ = undef;
    die "could not open ${file}: $!" 
      unless open my $fh, '<', $file;
    <$fh>;
};
$te->parse($document);
foreach my $ts ( $te->tables ) {
    print 'Table(', join( ',', $ts->coords ), ":\n";
    foreach my $row ( $ts->rows ) {
        foreach my $cell (@$row) {
            next unless $cell;
            $cell =~ s[</B>&nbsp;][]i;
            print $cell . "\n";
        }
    }
}
[download]

In Section Seekers of Perl Wisdom