Beefy Boxes and Bandwidth Generously Provided by pair Networks
laziness, impatience, and hubris
 
PerlMonks  

Re^2: resolving HTML::TableExtract error

by jaydon (Novice)
on Jul 13, 2005 at 21:18 UTC ( [id://474688]=note: print w/replies, xml ) Need Help??


in reply to Re: resolving HTML::TableExtract error
in thread resolving HTML::TableExtract error

I kind of did. The Synopsis in that documentation was where I got that code from:
# Shorthand...top level rows() method assumes the first table found in # the document if no arguments are supplied. foreach $row ($te->rows) { print join(',', @$row), "\n";

I am probably misinterpretting what it says, but I took that to mean that I don't have to examine all matching tables with an (outer) foreach loop if I am only concerned with the 1st table found.

Anyway I took your advice and added the outer foreach loop, but I my data file remains empty. Here is the ammended code:

use HTML::TableExtract; my $te = HTML::TableExtract->new( headers => [qr/Month\s*/, qr/First\s*/, qr/High\s*/, qr/Low\s*/, qr/Sett\s*/, qr/Chg\s*/, qr/Vol\s*/, qr/GOWAVE\*\s*/] ); $te->parse_file($sourcefile); my $record; open (DATFILE, ">> meg.dat") or die "Unable to open meg.dat: $!"; print DATFILE "Table:\n"; foreach my $ts ($te->table_states) { foreach my $row ($ts->rows) { $record = join(',', @$row); print $record . "\n"; print DATFILE $record . "\n"; } } close DATFILE;

And this is the html:

<tr align="center" valign="top"> <td><strong>Month </strong></td> <td><strong>First </strong></td> <td><strong>High </strong></td> <td><strong>Low </strong></td> <td bgcolor="#f3f3f3"><strong>Sett </strong></td> <td bgcolor="#f3f3f3"><strong>Chg </strong></td> <td><strong>Vol</strong></td> <td><strong>GOWAVE*</strong></td> <td width="1" style="border-bottom-style:none;"></td> <td><strong>Vol</strong></td> <td style="border-right:1px solid #C0C0C0;"><strong>Open Int</strong> +</td> </tr>

Replies are listed 'Best First'.
Re^3: resolving HTML::TableExtract error
by crashtest (Curate) on Jul 14, 2005 at 00:34 UTC

    The HTML snippet you provided above is not conducive to testing your code. Besides not being enclosed in <table> tags, it only has one row (the header). Both (apparently) prevent the HTML from being parsed into a table_state.

    Once I fixed that, your code (with haoess's extra loop over the tablestates) started producing data. One note of caution: According to the documentation, you should be passing regular expression strings to the constructor, not actual regular expressions. I.e., your constructor should look like:

    my $te = HTML::TableExtract->new( headers => [ qw( Month\s* First\s* High\s* ... )] );
    ... although your constructor with the qr//'s was working as well.

    I had no trouble using the rows method on the table extract object directly, as in your original post. That makes me wonder whether you grabbed an older version off CPAN. I'm guessing the shorthand rows method in the HTML::TableExtract class might have been added somewhere down the line. The version I have is 1.10.

    Hope this helps...

      That was great advice!

      I just pasted the header from the html file as I thought that I might not have constructed the tableextract object correctly.
      But you were right, I was missing the <table> tag as I had removed some lines from the html file. Once I processed the whole file, I got the results I wanted.

      Thank you!

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://474688]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others exploiting the Monastery: (6)
As of 2024-03-28 22:15 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found