http://www.perlmonks.org?node_id=986934


in reply to Re^2: How to fetch table element from a site into data
in thread How to fetch table element from a site into data

You almost got it(!), but you've captured the html content into $a, and then used $te->parse($html_string);.

Try the following (based on the HTML::TableExtract scripting example):

use Modern::Perl; use WWW::Mechanize; use HTML::TableExtract; my $mech = WWW::Mechanize->new(); $mech->get('http://www.w3schools.com/sql/default.asp'); my $html_string = $mech->content(); my $te = HTML::TableExtract->new( headers => [ ( 'Company', 'Country' +) ] ); $te->parse($html_string); foreach my $ts ( $te->tables ) { print "Table (", join( ',', $ts->coords ), "):\n"; foreach my $row ( $ts->rows ) { print join( ',', @$row ), "\n"; } }

Output

Table (0,0): Island Trading,UK Galería del gastrónomo,Spain Laughing Bacchus Wine Cellars,Canada Paris spécialités,France Simons bistro,Denmark Wolski Zajazd,Poland

Hope this helps!

Replies are listed 'Best First'.
Re^4: How to fetch table element from a site into data
by Anonymous Monk on Aug 12, 2012 at 05:07 UTC
    Yes, it works! :) Can you tell me how can I put these things in a hash with key as the header and the contents as the value. So that the o/p should be like this:
    $hash = { company => ('Island Trading', 'Galerφa del gastr≤nomo' +, 'Laughing Bacchus Wine Cellars', 'Paris spΘcialitΘs', 'Si +mons bistro', 'Wolski Zajazd') country => ('UK','Spain','Canada','France','Denmark','Poland') }

      One way is to populate arrays for both company and country with the elements from the dereferenced array reference ($row->[0]: company name; $row->[1]: country name), and then use those arrays to create strings (values) that will be associated with company and country keys. The script, as a whole, would then look like this:

      use Modern::Perl; use WWW::Mechanize; use HTML::TableExtract; my ( %hash, @company, @country ); my $mech = WWW::Mechanize->new(); $mech->get('http://www.w3schools.com/sql/default.asp'); my $html_string = $mech->content(); my $te = HTML::TableExtract->new( headers => [ ( 'Company', 'Country' +) ] ); $te->parse($html_string); for my $ts ( $te->tables ) { for my $row ( $ts->rows ) { push @company, qq|'$row->[0]'|; push @country, qq|'$row->[1]'|; } } $hash{'Company'} = '(' . ( join ', ', @company ) . ')'; $hash{'Country'} = '(' . ( join ', ', @country ) . ')'; say "$_ => $hash{$_}" for sort keys %hash

      Output:

      Company => ('Island Trading', 'Galería del gastrónomo', 'Laughing Bacc +hus Wine Cellars', 'Paris spécialités', 'Simons bistro', 'Wolski Zaja +zd') Country => ('UK', 'Spain', 'Canada', 'France', 'Denmark', 'Poland')