Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl Monk, Perl Meditation
 
PerlMonks  

Re: Perl, HTML::TableExtract

by Khen1950fx (Canon)
on Apr 28, 2013 at 15:49 UTC ( #1031090=note: print w/ replies, xml ) Need Help??


in reply to Perl, HTML::TableExtract

I ran into some problems when I used your html. I changed to XHTML:

<?xml version="1.0" encoding="utf-8"?> <html xmlns="http://www.w3.org/1999/xhtml"> <head/> <body> <table border="0"> <tbody> <tr valign="top"> <td> <table cellpadding="2" id="RefSNP" width="350"> <tbody> <tr> <th align="center" bgcolor="#ccccff" class="text10" +colspan="2">RefSNP</th> </tr> <tr> <td align="right" bgcolor="#f1f1f1" class="text10"> <strong>Organism:</strong> </td> <td bgcolor="#f1f1f1" class="text10">human (<a href= +"http://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?mode=Info&am +p;id=9606"><em>Homo sapiens</em></a>)</td> </tr> </tbody> </table> </td> </tr> </tbody> </table> </body> </html>
In addition to the advice that hdb and NetWallah gave you, I think that you want to set keep_html to 0:
#!/usr/bin/perl use strict; use warnings; use HTML::TableExtract; my $file = '/root/Desktop/html.htm'; my $te = 'HTML::TableExtract'->new( keep_html => 0, attribs => { id => 'RefSNP' }, ); $te->parse_file($file); my $document = do { local $/ = undef; die "could not open ${file}: $!" unless open my $fh, '<', $file; <$fh>; }; $te->parse($document); foreach my $ts ( $te->tables ) { print 'Table(', join( ',', $ts->coords ), ":\n"; foreach my $row ( $ts->rows ) { foreach my $cell (@$row) { next unless $cell; $cell =~ s[</B>&nbsp;][]i; print $cell . "\n"; } } }


Comment on Re: Perl, HTML::TableExtract
Select or Download Code

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://1031090]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others browsing the Monastery: (5)
As of 2015-07-30 02:26 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    The top three priorities of my open tasks are (in descending order of likelihood to be worked on) ...









    Results (269 votes), past polls