Beefy Boxes and Bandwidth Generously Provided by pair Networks
XP is just a number

Problems with TableExtract

by bcdeery (Novice)
on Jan 10, 2006 at 01:47 UTC ( #522074=perlquestion: print w/replies, xml ) Need Help??

bcdeery has asked for the wisdom of the Perl Monks concerning the following question:

I'm trying to pull the Ground Transit days off of the table at this URL: +oz=53213&oc=1&oh=ORD&dz=60056&dc=1&dt=1/9/2006&tt=1&hy=&zn=2&am=Y
but I either get nothing (with the code I have here) or I get the error "can't call method "rows" on an undefined value at line 224.
$url=" +Times"; $service = "Service"; $arrival = "Arrival Date and Time"; $transit = "Days in Transit*"; my $mech = WWW::Mechanize->new(); $mech->get($url); $frmSvcCalc = "frmSvcCalc"; $mech->form_name($frmSvcCalc); $mech->set_fields( txtOrgZip => "53213", txtDestZip => "60056", ); $mech->field( "hdnAction", "Calculate" ); $mech->submit($frmSvcCalc); my $results2 = $mech->content; print OUTFILE $results2; $te = HTML::TableExtract->new( headers => [qw($service $arrival $trans +it)] ); $te->parse($html_string); foreach $ts ($te->tables) { print "Table (", join(',', $ts->coords), "):\n"; foreach $row ($ts->rows) { print join(',', @$row), "\n"; } }
The other code that I try that gives me the error is:
my $te = HTML::TableExtract->new( headers => [qw($service $arrival $tr +ansit)] ); $te->parse($html_string); foreach my $row ($te->rows) { foreach my $cell (@$row) { print $cell; } }
I've tried a few other ways, but I obviously don't & can't get it. I'm beggin for help, and so is my hair (or what's left of it).

Replies are listed 'Best First'.
Re: Problems with TableExtract
by suaveant (Parson) on Jan 10, 2006 at 04:45 UTC
    Well... I am not familiar with either of these packages and it is odd, because the example of the table extract comes from the perldoc... but it doesn't return an object. By dumping the variables it did return I got it to work, however. Here is my code.
    use WWW::Mechanize; use HTML::TableExtract; $url=" +Times"; $service = "Service"; $arrival = "Arrival Date and Time"; $transit = "Days in Transit*"; my $mech = WWW::Mechanize->new(); $mech->get($url); $frmSvcCalc = "frmSvcCalc"; $mech->form_name($frmSvcCalc); $mech->set_fields( txtOrgZip => "53213", txtDestZip => "60056", ); $mech->field( "hdnAction", "Calculate" ); $mech->submit($frmSvcCalc); my $results2 = $mech->content; $te = HTML::TableExtract->new( headers => ['Service','Arrival Date and + Time','Days in Transit*'] ); $te->parse($results2); foreach $ts ($te->tables) { foreach $row (@$ts) { print join(',', @$row), "\n"; } }
    This, for me, prints each row you want on a line with each td separated by a comma, which should get you far enough along :)

    Update I was using version 1.08 and looking at docs on for 2.06, take with a grain of salt :)

                    - Ant
                    - Some of my best work - (1 2 3)

      Thinking about it, I should give you more info on what I did, so you can do it yourself in the future :)

      Personally I was just using print debugging. I started by getting

      Can't call method "coords" on unblessed reference at line 32.
      I printed out the value of $ts, which was an array, but was not blessed to a package. Since it had no package associated with it (when printing, HASH(0x864497c) is not blessed, HTML::TableExtract=HASH(0x864497c) is) that means you cannot call methods on it. So then I did print "@$ts\n"; and saw it was full of array refs. At this point you could loop through the array and print the results or do a print "@{$ts->[0]}\n";. Of course... the much better way to do this is use Data::Dumper and do a print Dumper($ts),"\n"; which would give you
      $VAR1 = [ [ 'DHL Next Day 10:30 am (Letter – 150 Pounds)', 'Tuesday, Jan 10, 2006 By 10:30 A.M.', '1' ], [ 'DHL Next Day 12:00 pm (Letter – 150 Pounds)', 'Tuesday, Jan 10, 2006 By Noon', '1' ], [ 'DHL Next Day 3:00 pm (Letter – 150 Pounds)', 'Tuesday, Jan 10, 2006 By 3:00 P.M.', '1' ], [ 'DHL 2nd Day Service (Letter – 150 Pounds)', 'Wednesday, Jan 11, 2006 By 5:00 P.M.', '2' ], [ 'DHL Ground Service (Letter – 150 Pounds)', 'Tuesday, Jan 10, 2006 By end of day', '1' ] ];
      Which shows the whole structure obviously.... quick and dirty debugging can often help you find out what is up, or you could always actually use the debugger, which I am usually too lazy to do :)

      Hopefully this helps even more than the first message.

      Any questions?

                      - Ant
                      - Some of my best work - (1 2 3)

Re: Problems with TableExtract
by mojotoad (Monsignor) on Jan 10, 2006 at 05:29 UTC
    The cannonical form is the second example you give, e.g.
    my $te = HTML::TableExtract->new( headers => [qw($service $arrival $tr +ansit)] ); $te->parse($html_string); foreach my $row ($te->rows) { foreach my $cell (@$row) { print $cell; } }
    The old style of dealing directly with the arrays of rows is no longer supported (though each row is still indeed an array).

    I note that in your example code, you're not setting $html_string beforehand. That could be a paste-o, but make sure you're using strict to catch that kind of thing.

    Having said all that -- if it's still not working, which version of HTML::TableExtract are you using?

    Matt (author of said module)

      Cool module :)

      I looked at my version and I have 1.08, for simplicity I had just gone with the debian version, I guess its a bit behind.

      I wonder if he did the same as me, has an older version but was looking at the docs on :)

                      - Ant
                      - Some of my best work - (1 2 3)

        With 1.08, try this:
        my $te = HTML::TableExtract->new( headers => [qw($service $arrival $tr +ansit)] ); $te->parse($html_string); my $ts = $te->first_table_state_found; foreach my $row ($ts->rows) { foreach my $cell (@$row) { print $cell; } }

        If it's actually capturing tables, that should work. (you can monitor, in detail, whether tables are being captured by setting the 'debug' parameter to varying levels (1..7) in the new() constructor).

        Let me know what happens,

        And yes, part of this is my fault. As I phased out the array-only functionality (i.e. the tables() method) I eventually aliased that to the table_states() method and it has caused confusion. I should have waited longer before ressurecting the tables() method, if at all. (in my defense, it is what I would have called the table_states() method to begin with, but in a very very early version of the module I used tables() to return the arrays. Ah well.)


Re: Problems with TableExtract
by johnnywang (Priest) on Jan 10, 2006 at 05:58 UTC
    Your script has many errors in it, it looks like you copy/pasted several things together. Please run your script before posting so we can concentrate on the problem part. That being said, the following works:
    use strict; use LWP::Simple; use HTML::TableExtract; my $page = get(" +av=TransitTimes&oz=53213&oc=1&oh=ORD&dz=60056&dc=1&dt=1/9/2006&tt=1&h +y=&zn=2&am=Y"); my $te = HTML::TableExtract->new( depth => 3, count => 3); $te->parse($page); foreach my $ts ($te->table_states) { foreach my $row ($ts->rows) { print join(',', @$row), "\n"; } } __OUTPUT__ Service,Arrival Date and Time,Days in Transit* DHL Next Day 10:30 am (Letter ű 150 Pounds),Tuesday,áJaná10,á2006 áBy 10:30 A.M.,1 DHL Next Day 12:00 pm (Letter ű 150 Pounds),Tuesday,áJaná10,á2006 áBy Noon,1 DHL Next Day 3:00 pm (Letter ű 150 Pounds),Tuesday,áJaná10,á2006 áBy 3:00 P.M.,1 DHL 2nd Day Service (Letter ű 150 Pounds),Wednesday,áJaná11,á2006 áBy 5:00 P.M.,2 DHL Ground Service (Letter ű 150 Pounds),Tuesday,áJaná10,á2006 áBy end of day,1

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://522074]
Approved by ww
Front-paged by planetscape
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others musing on the Monastery: (4)
As of 2020-09-25 06:53 GMT
Find Nodes?
    Voting Booth?
    If at first I don’t succeed, I …

    Results (136 votes). Check out past polls.