Using HTML::TableExtract is relatively straightforward but the content of your tables is weird. Play with the keep_html parameter in line 8, putting it to 0 or 1 whatever suits you best.
use strict; use warnings;
use HTML::TableExtract;
my $response;
{
local $/ = undef;
$response = <DATA>;
}
my $te = HTML::TableExtract->new( keep_html => 0 );
$te->parse($response);
foreach my $ts ($te->tables) {
print "\nTable found at ", join(',', $ts->coords), ":\n";
foreach my $row ($ts->rows) {
foreach my $col (@$row) {
if( defined $col ) {
print "\t$col";
} else {
print "\t---";
}
}
print "\n";
}
}
__DATA__
<!DOCTYPE html
PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" lang="en-US" xml:lang="en-U
+S"> <head> <title>Cell Collections</title> <link rel="stylesheet" typ
+e="text/css" href="/css/tree.css" /> <script src="/js/tree.js" type="
+text/javascript"></script> <meta http-equiv="Content-Type" content="t
+ext/html; charset=iso-8859-1" /> </head> <body bgcolor="#FFFFFF"> <im
+g src="/images/dvtk/IFX_LOGO.gif" alt="[IFX]" /><br /> <TABLE><TR><TD
+><img align="left" src="/images/dvtk/logo100.gif"
alt="[LOGO]" /><TD><h1>Cell collection status</h1><h3>AnalogIP:Test1
command: exec</h3></TD></TR></TABLE>
<br /><p /><a href="collstat.pl"><b>[Collection overview]</b></a> 
+; <a href="collstat.pl?path=/home/micado/autan_c65fla/dss/dss.co
+mmon.default/units/macro/dvtk/results">[Overview
for this collection]</a> <a
href="cells.pl?path=/home/micado/autan_c65fla/dss/dss.common.default/u
+nits/macro/dvtk/results">[All
cells of this collection]</a> <a href="report.pl?path=/home
+/micado/autan_c65fla/dss/dss.common.default/units/macro/dvtk/results"
+>[Custom
HTML Report]</a><p /><br />
<FONT SIZE=-1><I>(Click on the hyperlinks in the table headers to see
+information on methods [if available])</I></FONT><BR> <TABLE BORDER=1
+><TR BGCOLOR="#D0D0D0"><TH ALIGN=LEFT
NOWRAP>Cell</TH><TH ALIGN=LEFT>cdl_prefix</TH><TH
ALIGN=LEFT>drc</TH><TH ALIGN=LEFT>drc_prefix</TH><TH ALIGN=LEFT>gds_ab
+stract</TH><TH ALIGN=LEFT>gds_prefix</TH><TH ALIGN=LEFT><A HREF="meth
+od.pl?path=/home/micado/autan_c65fla/dss/dss.common.default/units/mac
+ro/dvtk/results;task=layverDoc;command=exec">layverDoc</A></TH><TH
ALIGN=LEFT><A HREF="method.pl?path=/home/micado/autan_c65fla/dss/dss.c
+ommon.default/units/macro/dvtk/results;task=layverShadow;command=exec
+">layverShadow</A></TH><TH
ALIGN=LEFT>lvs</TH><TH ALIGN=LEFT>lvs_prefix</TH><TH ALIGN=LEFT>macroL
+ib</TH><TH ALIGN=LEFT>ocapi</TH><TH ALIGN=LEFT><A HREF="method.pl?pat
+h=/home/micado/autan_c65fla/dss/dss.common.default/units/macro/dvtk/r
+esults;task=prefix;command=exec">prefix</A></TH></TR>
<TR CLASS='node' ID='clock_x2' VALIGN=top><TD ALIGN=LEFT NOWRAP><A CLA
+SS=blank HREF=""
ONCLICK="toggleRow(this.parentNode.parentNode.parentNode, 'clock_x2');
+ return false;"> <IMG ID="IMGclock_x2"
SRC="/images/dvtk/minus.gif"
BORDER=0> <B>clock_x2</B></A><B></B></TD><TD
BGCOLOR="#D0D0D0"></TD><TD BGCOLOR="#D0D0D0"></TD><TD BGCOLOR="#D0D0D0
+"></TD><TD BGCOLOR="#D0D0D0"></TD><TD BGCOLOR="#D0D0D0"></TD><TD BGCO
+LOR="#D0D0D0"></TD><TD BGCOLOR="#D0D0D0"></TD><TD BGCOLOR="#D0D0D0"><
+/TD><TD BGCOLOR="#D0D0D0"></TD><TD BGCOLOR="#D0D0D0"></TD><TD BGCOLOR
+="#D0D0D0"></TD><TD BGCOLOR="#D0D0D0"></TD></TR>
<TR CLASS='node' ID='clock_x2@clock_x2' VALIGN=top><TD ALIGN=LEFT
NOWRAP> <IMG ID="IMGclock_x2@clock_x2"
SRC="/images/dvtk/bullet.gif"
BORDER=0> <B>clock_x2</B><B></B></TD><TD><FONT
COLOR="green">OK</FONT></TD><TD>-</TD><TD><FONT
COLOR="green">OK</FONT></TD><TD><FONT
COLOR="green">OK</FONT></TD><TD><FONT
COLOR="green">OK</FONT></TD><TD><FONT
COLOR="green">OK</FONT></TD><TD><FONT
COLOR="green">OK</FONT></TD><TD>-</TD><TD><FONT
COLOR="green">OK</FONT></TD><TD><FONT
COLOR="green">OK</FONT></TD><TD><FONT
COLOR="green">OK</FONT></TD><TD><FONT
COLOR="green">OK</FONT></TD></TR>
</TABLE>
<BR><hr>
<TABLE BORDER=0 CELLSPACING=0 CELLPADDING=0 WIDTH=100%>
<TR>
<TD WIDTH=42%>
<FONT SIZE=-2>© Copyright(c) 1996 - 2013 by , Team<BR>
All rights reserved.</FONT></TD>
<TD WIDTH=42%>
<FONT SIZE=-2>Last modified Tue Apr 9 08:44:32 MEST 2013 <BR>
by <A HREF="mailto:user@gmail.com">usergroup</A>
</TD>
<TD>
<IMG BORDER=0 SRC="/images/dvtk/dvtk_dev_team.jpg" ALIGN=RIGHT
HSPACE=0 VSPACE=0 ALT="[dvtk]">
</TD>
</TR>
</TABLE>
<p /><br />
</body>
You still need to replace the print commands with code that puts the data into your spreadsheet.
-
Are you posting in the right place? Check out Where do I post X? to know for sure.
-
Posts may use any of the Perl Monks Approved HTML tags. Currently these include the following:
<code> <a> <b> <big>
<blockquote> <br /> <dd>
<dl> <dt> <em> <font>
<h1> <h2> <h3> <h4>
<h5> <h6> <hr /> <i>
<li> <nbsp> <ol> <p>
<small> <strike> <strong>
<sub> <sup> <table>
<td> <th> <tr> <tt>
<u> <ul>
-
Snippets of code should be wrapped in
<code> tags not
<pre> tags. In fact, <pre>
tags should generally be avoided. If they must
be used, extreme care should be
taken to ensure that their contents do not
have long lines (<70 chars), in order to prevent
horizontal scrolling (and possible janitor
intervention).
-
Want more info? How to link
or How to display code and escape characters
are good places to start.
|