Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl-Sensitive Sunglasses
 
PerlMonks  

Re^2: How to fetch table element from a site into data

by Anonymous Monk
on Aug 12, 2012 at 04:22 UTC ( #986931=note: print w/ replies, xml ) Need Help??


in reply to Re: How to fetch table element from a site into data
in thread How to fetch table element from a site into data

My code is something like this:

use WWW::Mechanize; use HTTP::Cookies; use HTML::TableParser; use HTML::TableExtract; my $mech = WWW::Mechanize->new(); $mech->get('http://www.w3schools.com/sql/default.asp'); my $a = $mech->content(); $te = HTML::TableExtract->new( headers => [('Company', 'Country')] ); $te->parse($html_string); # Examine all matching tables foreach $ts ($te->tables) { print "Table (", join(',', $ts->coords), "):\n"; foreach $row ($ts->rows) { print join(',', @$row), "\n"; } } # Shorthand...top level rows() method assumes the first table found i +n # the document if no arguments are supplied. foreach $row ($te->rows) { print join(',', @$row), "\n"; }


Comment on Re^2: How to fetch table element from a site into data
Download Code
Replies are listed 'Best First'.
Re^3: How to fetch table element from a site into data
by Kenosis (Priest) on Aug 12, 2012 at 04:53 UTC

    You almost got it(!), but you've captured the html content into $a, and then used $te->parse($html_string);.

    Try the following (based on the HTML::TableExtract scripting example):

    use Modern::Perl; use WWW::Mechanize; use HTML::TableExtract; my $mech = WWW::Mechanize->new(); $mech->get('http://www.w3schools.com/sql/default.asp'); my $html_string = $mech->content(); my $te = HTML::TableExtract->new( headers => [ ( 'Company', 'Country' +) ] ); $te->parse($html_string); foreach my $ts ( $te->tables ) { print "Table (", join( ',', $ts->coords ), "):\n"; foreach my $row ( $ts->rows ) { print join( ',', @$row ), "\n"; } }

    Output

    Table (0,0): Island Trading,UK Galería del gastrónomo,Spain Laughing Bacchus Wine Cellars,Canada Paris spécialités,France Simons bistro,Denmark Wolski Zajazd,Poland

    Hope this helps!

      Yes, it works! :) Can you tell me how can I put these things in a hash with key as the header and the contents as the value. So that the o/p should be like this:
      $hash = { company => ('Island Trading', 'Galerφa del gastr≤nomo' +, 'Laughing Bacchus Wine Cellars', 'Paris spΘcialitΘs', 'Si +mons bistro', 'Wolski Zajazd') country => ('UK','Spain','Canada','France','Denmark','Poland') }

        One way is to populate arrays for both company and country with the elements from the dereferenced array reference ($row->[0]: company name; $row->[1]: country name), and then use those arrays to create strings (values) that will be associated with company and country keys. The script, as a whole, would then look like this:

        use Modern::Perl; use WWW::Mechanize; use HTML::TableExtract; my ( %hash, @company, @country ); my $mech = WWW::Mechanize->new(); $mech->get('http://www.w3schools.com/sql/default.asp'); my $html_string = $mech->content(); my $te = HTML::TableExtract->new( headers => [ ( 'Company', 'Country' +) ] ); $te->parse($html_string); for my $ts ( $te->tables ) { for my $row ( $ts->rows ) { push @company, qq|'$row->[0]'|; push @country, qq|'$row->[1]'|; } } $hash{'Company'} = '(' . ( join ', ', @company ) . ')'; $hash{'Country'} = '(' . ( join ', ', @country ) . ')'; say "$_ => $hash{$_}" for sort keys %hash

        Output:

        Company => ('Island Trading', 'Galería del gastrónomo', 'Laughing Bacc +hus Wine Cellars', 'Paris spécialités', 'Simons bistro', 'Wolski Zaja +zd') Country => ('UK', 'Spain', 'Canada', 'France', 'Denmark', 'Poland')

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://986931]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others romping around the Monastery: (20)
As of 2015-07-31 16:19 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    The top three priorities of my open tasks are (in descending order of likelihood to be worked on) ...









    Results (279 votes), past polls