Beefy Boxes and Bandwidth Generously Provided by pair Networks
more useful options
 
PerlMonks  

Problem extracting an HTML table with Perl

by Sosi (Sexton)
on Aug 11, 2014 at 16:33 UTC ( #1097013=perlquestion: print w/replies, xml ) Need Help??
Sosi has asked for the wisdom of the Perl Monks concerning the following question:

Hi guys! I am trying to extract some information from a table in an HTML page. Namely, the small chunk of info that you see under "Representative" in the original website. But, although I *think* I am searching for the correct tag, I think I'm getting the whole file instead? Could you please help me find what I am doing wrong?

Here is the code I have so far. I am sorry if this is such a simple question

#!/usr/local/bin/perl use strict; use warnings; use autodie; use Data::Dump; use HTML::Tree; use LWP::Simple qw(get); my $content=get('http://www.ncbi.nlm.nih.gov/genome/?term=Xylella_fast +idiosa'); my $tree = HTML::Tree->new(); $tree->parse($content); my $data =$tree->look_down( '_tag' =>'div', class => 'genome_descr' ); dd $data;

How would you extract those lines (not the table) into an array? What am I doing wrong in the search?

Thanks in advance! You guys rock!

Replies are listed 'Best First'.
Re: Problem extracting an HTML table with Perl
by kennethk (Abbot) on Aug 11, 2014 at 16:47 UTC
    The problem is your visualiztion method (I think). Data::Dump outputs all entries in the object, and HTML::Elements are built with _parent keys so you can navigate bidirectionally. If you output with
    print $data->as_HTML;
    instead, I think you'll get what you expect.

    #11929 First ask yourself `How would I do this without a computer?' Then have the computer do it the same way.

      Indeed, it got a bit better, but I am still getting a lot of information. I now found that my search is completely independent of that "class" in my $tree->find. So any of the following alternatives gives the same result, and shows that the search is only done on the tag:

      my $data =$tree->find( '_tag' =>'div' );

      or even

      my $data =$tree->find( '_tag' =>'div', class => 'somethingthatdoesnotexists1209841290r' );
        That is not what I see, and I note the OP used the look down method instead of the find method as you have in this post.

        If I run

        #!/usr/local/bin/perl use strict; use warnings; use autodie; use Data::Dump; use HTML::Tree; use LWP::Simple qw(get); my $content=get('http://www.ncbi.nlm.nih.gov/genome/?term=Xylella_fast +idiosa'); my $tree = HTML::Tree->new(); $tree->parse($content); my $data =$tree->look_down( '_tag' =>'div', class => 'genome_descr' ); print $data->as_HTML;
        I get the output
        <div class="genome_descr"><p><b>Submitter: </b><a href="http://aeg.lbi +.ic.unicamp.br/xf/" target="_blank">Sao Paulo state (Brazil) Consorti +um</a></div>
        If I run with
        my @data =$tree->look_down( '_tag' =>'div', class => 'genome_descr' );
        instead, I get 2 results. How does this compare for you?

        #11929 First ask yourself `How would I do this without a computer?' Then have the computer do it the same way.

Re: Problem extracting an HTML table with Perl
by runrig (Abbot) on Aug 11, 2014 at 16:42 UTC

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://1097013]
Approved by Athanasius
help
Chatterbox?
and all is quiet...

How do I use this? | Other CB clients
Other Users?
Others avoiding work at the Monastery: (6)
As of 2017-10-22 05:04 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?
    My fridge is mostly full of:

















    Results (272 votes). Check out past polls.

    Notices?