Beefy Boxes and Bandwidth Generously Provided by pair Networks
go ahead... be a heretic
 
PerlMonks  

Re: parsing html

by ramrod (Curate)
on May 14, 2009 at 15:29 UTC ( [id://764102]=note: print w/replies, xml ) Need Help??


in reply to parsing html

Out of curiosity, did you try to use HTML::Element?

I searched CPAN, and I came across HTML::Parser I would start there if I were doing this on my own. The documentation has examples.

At any rate, try these modules and post the problems/errors you receive. There's a better chance of receiving the advice you seek that way.

Replies are listed 'Best First'.
Re^2: parsing html
by paola82 (Sexton) on May 14, 2009 at 15:58 UTC

    Now I paste my code....the one I used and the error message...I would'nt past it before for not looking so stupid as I am....:-(

    #!/usr/local/bin/perl use strict; use warnings; use LWP::Simple; my $url3="http://microrna.sanger.ac.uk/cgi-bin/targets/v5/detail_view. +pl?transcript_id=ENST00000226253"; my $content=get $url3; use HTML::TreeBuilder; my $tree = HTML::TreeBuilder->new; $tree->parse_file($content); $tree->delete; use HTML::Element; my @elements = my $element->find('b',); my @anchors = $element->look_down('_tag' => 'b'); print "@elements\n";

    and now the error.... Can't call method "find" on an undefined value at test.pl line 17......I don't now how to select the string between "b" and "/b" because I don't actually know html......and I don't understand the synthax...

      Nearly! :-)
      !/usr/bin/perl use warnings; use strict; use HTML::TreeBuilder; my $html = do{local $/;<DATA>}; my $p = HTML::TreeBuilder->new; $p->parse_content($html); # parse_content if you have a string my @tds = $p->look_down(_tag => q{td}); # get a list of all the td tag +s for my $td (@tds){ my $bold = $td->look_down(_tag => q{b}); # look for a bold tag if ($bold){ print $bold->as_text, qq{\n}; # if there is one print the text } } $p->delete; # when you've finished with it

        Thanks...I read it just now :-) and tried this

        #!/usr/local/bin/perl use warnings; use strict; use LWP::Simple; use HTML::TreeBuilder; my @files = (["http://microrna.sanger.ac.uk/cgi-bin/targets/v5/detail_ +view.pl?transcript_id=ENST00000226253", "a.txt"],); for my $duplet (@files) { mirror($duplet->[0], $duplet->[1]); }; open DATA, 'a.txt'; my $html = do{local $/;<DATA>}; my $p = HTML::TreeBuilder->new; $p->parse_content($html); # parse_content if you have a string my @tds = $p->look_down(_tag => q{td}); # get a list of all the td tag +s for my $td (@tds){ my $bold = $td->look_down(_tag => q{b}); # look for a bold tag if ($bold){ print $bold->as_text, qq{\n}; # if there is one print the text } } $p->delete; # when you've finished with it

        so I have the last 2 question, to ask to monks....for today :-) : 1)shall I have to download the content of the web page...to work with filehandle DATA, this is the only way I find to make it works...2) the second question is: how to refine my script to make it prints only the data I need...thanks you all, you are essential for Perl community, and for my bioinformatics work....thanks

      Aside from understanding wfsp's solution, definitely check out the Documentation section of HTML-Tree for some articles for relative beginners that certainly aided my understanding of OO modules, HTML, tree structures, and parsing.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://764102]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others surveying the Monastery: (4)
As of 2024-04-19 05:11 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found