http://www.perlmonks.org?node_id=1020598

halweitz has asked for the wisdom of the Perl Monks concerning the following question:

In the code snippet below everything seems to work fine.
The HTML file is a saved web page with a table.
The HTML sting is parsed correctly into $tree2.
$table1 is extracted from $tree2 and
the $row_cnt varible is set correctly to 111 rows.
However, the $table->cell($i, 9) method returns
the following when I print the contents of $cell1:

111 rows found
start the row loop
check opening date cell in row: 1
cell1 --> HTML::ElementTable::DataElement=HASH(0x2e4b6ec)

It should return a date like 02-25-2013 but never does.
How do I get a string instead of a hash?
Any thoughts will help.

#!perl -w use HTML::TableExtract qw(tree); use File::Slurp qw( :all ) ; $date = '02-25-2013'; $Path = 'C://path//to//File.htm'; $html_string = read_file($Path); my $tree2 = HTML::TableExtract->new( keep_html => 1, headers => [qw(NSN)], slice_columns => 0, keep_headers => 0, gridmap => 0, strip_html_on_match => 1, debug => 1, decode => 1 ); $tree2->parse($html_string); $table1 = $tree2->first_table_found; my @rows = $table1->rows; my $row_cnt = @rows; print "$row_cnt rows found \n"; my $cell1; my $i = 1; # row number, skip the header row print "start the row loop \n"; while ($i < $row_cnt ) { print "checking date in cell 9 in row: $i \n"; my $cell1 = $table1->cell($i, 9); #row $i, column 9 print "cell1 --> $cell1 \n"; next if $cell1 !~ /$date/; # not today # # there is more code that does not # appear here to save some data from the row # to a file # } continue { $i++; }

Replies are listed 'Best First'.
Re: Using HTML::TableExtract cell method
by vinoth.ree (Monsignor) on Feb 26, 2013 at 05:41 UTC

    Hi halweitz,

    My suggestion is just print the content of @rows with the Data::Dumper and find whether you have the date field in rows or not.

      Thanks for the response. I found the problem. When I removed the

      qw(tree)

      from

      use HTML::TableExtract qw(tree);

      the text for the date is correct.