Beefy Boxes and Bandwidth Generously Provided by pair Networks
No such thing as a small change

Parsing HTML using TreeBuilder

by monsterzero (Monk)
on Oct 28, 2003 at 18:49 UTC ( #302757=perlquestion: print w/ replies, xml ) Need Help??
monsterzero has asked for the wisdom of the Perl Monks concerning the following question:

Hi All,

I am trying to use HTML::TreeBuilder to parse some HTML data I have retrieved from the web. Specifily, I would like to extract the sunrise/sunset data from the web page. Below is what I have tried. The attribute I am looking for is everything between the pre tags, However I am afraid I do not understand what is being displayed when I print the all_attr method :(

Can anyone shead some light on this?


Ron Hill

use strict; use warnings; use HTML::TreeBuilder; my $data = do { local $/; <DATA> }; my $tree = HTML::TreeBuilder->new_from_content($data); print $tree->all_attr(); __DATA__ <html> <head><title>Sun and Moon Data for One Day</title></head> <body> <br> <h4>U.S. Naval Observatory<br>Astronomical Applications Department</h4 +> <br> <h3>Sun and Moon Data for One Day</h3> <p>The following information is provided for Adelaide Australia (longitude E138.6, latitude S34.9): </p> <pre> Saturday 21 June 2003 Universal Time + 9h <strong>SUN</strong> Begin civil twilight 06:25 Sunrise 06:53 Sun transit 11:47 Sunset 16:41 End civil twilight 17:10 <strong>MOON</strong> Moonrise 22:45 on preceding day Moon transit 05:24 Moonset 11:53 Moonrise 23:43 Moonset 12:19 on following day </pre> <p>Last quarter Moon on 21 June 2003 at 23:45 (Universal Time + 9h). </p> <br> <br> <br> </body> </html>

Edit by tye, replace PRE with P tags

Comment on Parsing HTML using TreeBuilder
Download Code
Replies are listed 'Best First'.
Re: Parsing HTML using TreeBuilder
by Art_XIV (Hermit) on Oct 28, 2003 at 20:27 UTC

    You're focusing on all_attr() which probably isn't going to be of much use to you at this stage.

    You probably want something like:

    use strict; use warnings; use HTML::TreeBuilder; my $data = do { local $/; <DATA> }; my $tree = HTML::TreeBuilder->new_from_content($data); my $pre_tag = $tree->look_down("_tag", "pre"); print $pre_tag->as_text(), "\n";

    Take a look at the the docs for Tree::Scanning for a basic tutorial and then have a look at the docs for HTML::Element since it is used alot by HTML::Treebuilder. HTML::Element has alot of methods that will be useful to you.

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://302757]
Approved by castaway
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others rifling through the Monastery: (4)
As of 2016-02-10 03:15 GMT
Find Nodes?
    Voting Booth?

    How many photographs, souvenirs, artworks, trophies or other decorative objects are displayed in your home?

    Results (331 votes), past polls