Beefy Boxes and Bandwidth Generously Provided by pair Networks
Problems? Is your data what you think it is?
 
PerlMonks  

Parsing HTML using TreeBuilder

by monsterzero (Monk)
on Oct 28, 2003 at 18:49 UTC ( #302757=perlquestion: print w/ replies, xml ) Need Help??
monsterzero has asked for the wisdom of the Perl Monks concerning the following question:

Hi All,

I am trying to use HTML::TreeBuilder to parse some HTML data I have retrieved from the web. Specifily, I would like to extract the sunrise/sunset data from the web page. Below is what I have tried. The attribute I am looking for is everything between the pre tags, However I am afraid I do not understand what is being displayed when I print the all_attr method :(

Can anyone shead some light on this?

Thanks

Ron Hill

use strict; use warnings; use HTML::TreeBuilder; my $data = do { local $/; <DATA> }; my $tree = HTML::TreeBuilder->new_from_content($data); print $tree->all_attr(); __DATA__ <html> <head><title>Sun and Moon Data for One Day</title></head> <body> <br> <h4>U.S. Naval Observatory<br>Astronomical Applications Department</h4 +> <br> <h3>Sun and Moon Data for One Day</h3> <p>The following information is provided for Adelaide Australia (longitude E138.6, latitude S34.9): </p> <pre> Saturday 21 June 2003 Universal Time + 9h <strong>SUN</strong> Begin civil twilight 06:25 Sunrise 06:53 Sun transit 11:47 Sunset 16:41 End civil twilight 17:10 <strong>MOON</strong> Moonrise 22:45 on preceding day Moon transit 05:24 Moonset 11:53 Moonrise 23:43 Moonset 12:19 on following day </pre> <p>Last quarter Moon on 21 June 2003 at 23:45 (Universal Time + 9h). </p> <br> <br> <br> </body> </html>

Edit by tye, replace PRE with P tags

Comment on Parsing HTML using TreeBuilder
Download Code
Re: Parsing HTML using TreeBuilder
by Art_XIV (Hermit) on Oct 28, 2003 at 20:27 UTC

    You're focusing on all_attr() which probably isn't going to be of much use to you at this stage.

    You probably want something like:

    use strict; use warnings; use HTML::TreeBuilder; my $data = do { local $/; <DATA> }; my $tree = HTML::TreeBuilder->new_from_content($data); my $pre_tag = $tree->look_down("_tag", "pre"); print $pre_tag->as_text(), "\n";

    Take a look at the the docs for Tree::Scanning for a basic tutorial and then have a look at the docs for HTML::Element since it is used alot by HTML::Treebuilder. HTML::Element has alot of methods that will be useful to you.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://302757]
Approved by castaway
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others contemplating the Monastery: (8)
As of 2014-08-28 00:42 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    The best computer themed movie is:











    Results (254 votes), past polls