Beefy Boxes and Bandwidth Generously Provided by pair Networks
more useful options
 
PerlMonks  

How to fetch table element from a site into data

by Anonymous Monk
on Aug 11, 2012 at 18:56 UTC ( #986918=perlquestion: print w/ replies, xml ) Need Help??
Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Hi Monks, I've to extract data from a table on a webpage and then I want them to put inside a table in database. I found HTML::TableParse & HTML::TableExtract but they are not working for my case, can somebody provide me a simple example for both. My table is having some Id and 5 columns. I am having table with the same column name in database, now I just want to grab the exact value from the website and to put them exactly in the same table. e.g. you can consider this link itself perl monk's Newest Node page and let me know how can I fetch the Questions table contents and can put them in a table with the same column name.

Comment on How to fetch table element from a site into data
Re: How to fetch table element from a site into data
by moritz (Cardinal) on Aug 11, 2012 at 19:26 UTC
    I found HTML::TableParse & HTML::TableExtract but they are not working for my case

    Why not? What's the problem? Please show the code you've written to try them.

    Also you need to show the HTML table you are trying to extract data from.

      My code is something like this:
      use WWW::Mechanize; use HTTP::Cookies; use HTML::TableParser; use HTML::TableExtract; my $mech = WWW::Mechanize->new(); $mech->get('http://www.w3schools.com/sql/default.asp'); my $a = $mech->content(); $te = HTML::TableExtract->new( headers => [('Company', 'Country')] ); $te->parse($html_string); # Examine all matching tables foreach $ts ($te->tables) { print "Table (", join(',', $ts->coords), "):\n"; foreach $row ($ts->rows) { print join(',', @$row), "\n"; } } # Shorthand...top level rows() method assumes the first table found i +n # the document if no arguments are supplied. foreach $row ($te->rows) { print join(',', @$row), "\n"; }

        You almost got it(!), but you've captured the html content into $a, and then used $te->parse($html_string);.

        Try the following (based on the HTML::TableExtract scripting example):

        use Modern::Perl; use WWW::Mechanize; use HTML::TableExtract; my $mech = WWW::Mechanize->new(); $mech->get('http://www.w3schools.com/sql/default.asp'); my $html_string = $mech->content(); my $te = HTML::TableExtract->new( headers => [ ( 'Company', 'Country' +) ] ); $te->parse($html_string); foreach my $ts ( $te->tables ) { print "Table (", join( ',', $ts->coords ), "):\n"; foreach my $row ( $ts->rows ) { print join( ',', @$row ), "\n"; } }

        Output

        Table (0,0): Island Trading,UK Galería del gastrónomo,Spain Laughing Bacchus Wine Cellars,Canada Paris spécialités,France Simons bistro,Denmark Wolski Zajazd,Poland

        Hope this helps!

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://986918]
Approved by Corion
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others avoiding work at the Monastery: (15)
As of 2014-07-23 20:14 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    My favorite superfluous repetitious redundant duplicative phrase is:









    Results (152 votes), past polls