Beefy Boxes and Bandwidth Generously Provided by pair Networks
Just another Perl shrine
 
PerlMonks  

Using HTML::TableExtract cell method

by halweitz (Novice)
on Feb 26, 2013 at 03:31 UTC ( #1020598=perlquestion: print w/ replies, xml ) Need Help??
halweitz has asked for the wisdom of the Perl Monks concerning the following question:

In the code snippet below everything seems to work fine.
The HTML file is a saved web page with a table.
The HTML sting is parsed correctly into $tree2.
$table1 is extracted from $tree2 and
the $row_cnt varible is set correctly to 111 rows.
However, the $table->cell($i, 9) method returns
the following when I print the contents of $cell1:

111 rows found
start the row loop
check opening date cell in row: 1
cell1 --> HTML::ElementTable::DataElement=HASH(0x2e4b6ec)

It should return a date like 02-25-2013 but never does.
How do I get a string instead of a hash?
Any thoughts will help.

#!perl -w use HTML::TableExtract qw(tree); use File::Slurp qw( :all ) ; $date = '02-25-2013'; $Path = 'C://path//to//File.htm'; $html_string = read_file($Path); my $tree2 = HTML::TableExtract->new( keep_html => 1, headers => [qw(NSN)], slice_columns => 0, keep_headers => 0, gridmap => 0, strip_html_on_match => 1, debug => 1, decode => 1 ); $tree2->parse($html_string); $table1 = $tree2->first_table_found; my @rows = $table1->rows; my $row_cnt = @rows; print "$row_cnt rows found \n"; my $cell1; my $i = 1; # row number, skip the header row print "start the row loop \n"; while ($i < $row_cnt ) { print "checking date in cell 9 in row: $i \n"; my $cell1 = $table1->cell($i, 9); #row $i, column 9 print "cell1 --> $cell1 \n"; next if $cell1 !~ /$date/; # not today # # there is more code that does not # appear here to save some data from the row # to a file # } continue { $i++; }

Comment on Using HTML::TableExtract cell method
Download Code
Re: Using HTML::TableExtract cell method
by vinoth.ree (Prior) on Feb 26, 2013 at 05:41 UTC

    Hi halweitz,

    My suggestion is just print the content of @rows with the Data::Dumper and find whether you have the date field in rows or not.

      Thanks for the response. I found the problem. When I removed the

      qw(tree)

      from

      use HTML::TableExtract qw(tree);

      the text for the date is correct.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://1020598]
Approved by vinoth.ree
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others chilling in the Monastery: (4)
As of 2015-07-05 08:42 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    The top three priorities of my open tasks are (in descending order of likelihood to be worked on) ...









    Results (61 votes), past polls