Beefy Boxes and Bandwidth Generously Provided by pair Networks
P is for Practical

Re: resolving HTML::TableExtract error

by haoess (Curate)
on Jul 13, 2005 at 20:48 UTC ( #474683=note: print w/replies, xml ) Need Help??

in reply to resolving HTML::TableExtract error

foreach $row ($te->rows) {

You'll have to walk through the parsed tables first:

foreach my $ts ( $te->table_states ) {
    foreach my $row ( $ts->rows ) {

Please have a look at perldoc HTML::TableExtract and feel free to contact its author to provide better error messages for misuses like yours.


Replies are listed 'Best First'.
Re^2: resolving HTML::TableExtract error
by jaydon (Novice) on Jul 13, 2005 at 21:18 UTC
    I kind of did. The Synopsis in that documentation was where I got that code from:
    # level rows() method assumes the first table found in # the document if no arguments are supplied. foreach $row ($te->rows) { print join(',', @$row), "\n";

    I am probably misinterpretting what it says, but I took that to mean that I don't have to examine all matching tables with an (outer) foreach loop if I am only concerned with the 1st table found.

    Anyway I took your advice and added the outer foreach loop, but I my data file remains empty. Here is the ammended code:

    use HTML::TableExtract; my $te = HTML::TableExtract->new( headers => [qr/Month\s*/, qr/First\s*/, qr/High\s*/, qr/Low\s*/, qr/Sett\s*/, qr/Chg\s*/, qr/Vol\s*/, qr/GOWAVE\*\s*/] ); $te->parse_file($sourcefile); my $record; open (DATFILE, ">> meg.dat") or die "Unable to open meg.dat: $!"; print DATFILE "Table:\n"; foreach my $ts ($te->table_states) { foreach my $row ($ts->rows) { $record = join(',', @$row); print $record . "\n"; print DATFILE $record . "\n"; } } close DATFILE;

    And this is the html:

    <tr align="center" valign="top"> <td><strong>Month </strong></td> <td><strong>First </strong></td> <td><strong>High </strong></td> <td><strong>Low </strong></td> <td bgcolor="#f3f3f3"><strong>Sett </strong></td> <td bgcolor="#f3f3f3"><strong>Chg </strong></td> <td><strong>Vol</strong></td> <td><strong>GOWAVE*</strong></td> <td width="1" style="border-bottom-style:none;"></td> <td><strong>Vol</strong></td> <td style="border-right:1px solid #C0C0C0;"><strong>Open Int</strong> +</td> </tr>

      The HTML snippet you provided above is not conducive to testing your code. Besides not being enclosed in <table> tags, it only has one row (the header). Both (apparently) prevent the HTML from being parsed into a table_state.

      Once I fixed that, your code (with haoess's extra loop over the tablestates) started producing data. One note of caution: According to the documentation, you should be passing regular expression strings to the constructor, not actual regular expressions. I.e., your constructor should look like:

      my $te = HTML::TableExtract->new( headers => [ qw( Month\s* First\s* High\s* ... )] );
      ... although your constructor with the qr//'s was working as well.

      I had no trouble using the rows method on the table extract object directly, as in your original post. That makes me wonder whether you grabbed an older version off CPAN. I'm guessing the shorthand rows method in the HTML::TableExtract class might have been added somewhere down the line. The version I have is 1.10.

      Hope this helps...

        That was great advice!

        I just pasted the header from the html file as I thought that I might not have constructed the tableextract object correctly.
        But you were right, I was missing the <table> tag as I had removed some lines from the html file. Once I processed the whole file, I got the results I wanted.

        Thank you!

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://474683]
and all is quiet...

How do I use this? | Other CB clients
Other Users?
Others examining the Monastery: (2)
As of 2018-05-25 02:31 GMT
Find Nodes?
    Voting Booth?