Beefy Boxes and Bandwidth Generously Provided by pair Networks
Don't ask to ask, just ask
 
PerlMonks  

Parsing HTML using TreeBuilder

by monsterzero (Monk)
on Oct 28, 2003 at 18:49 UTC ( #302757=perlquestion: print w/ replies, xml ) Need Help??
monsterzero has asked for the wisdom of the Perl Monks concerning the following question:

Hi All,

I am trying to use HTML::TreeBuilder to parse some HTML data I have retrieved from the web. Specifily, I would like to extract the sunrise/sunset data from the web page. Below is what I have tried. The attribute I am looking for is everything between the pre tags, However I am afraid I do not understand what is being displayed when I print the all_attr method :(

Can anyone shead some light on this?

Thanks

Ron Hill

use strict; use warnings; use HTML::TreeBuilder; my $data = do { local $/; <DATA> }; my $tree = HTML::TreeBuilder->new_from_content($data); print $tree->all_attr(); __DATA__ <html> <head><title>Sun and Moon Data for One Day</title></head> <body> <br> <h4>U.S. Naval Observatory<br>Astronomical Applications Department</h4 +> <br> <h3>Sun and Moon Data for One Day</h3> <p>The following information is provided for Adelaide Australia (longitude E138.6, latitude S34.9): </p> <pre> Saturday 21 June 2003 Universal Time + 9h <strong>SUN</strong> Begin civil twilight 06:25 Sunrise 06:53 Sun transit 11:47 Sunset 16:41 End civil twilight 17:10 <strong>MOON</strong> Moonrise 22:45 on preceding day Moon transit 05:24 Moonset 11:53 Moonrise 23:43 Moonset 12:19 on following day </pre> <p>Last quarter Moon on 21 June 2003 at 23:45 (Universal Time + 9h). </p> <br> <br> <br> </body> </html>

Edit by tye, replace PRE with P tags

Comment on Parsing HTML using TreeBuilder
Download Code
Re: Parsing HTML using TreeBuilder
by Art_XIV (Hermit) on Oct 28, 2003 at 20:27 UTC

    You're focusing on all_attr() which probably isn't going to be of much use to you at this stage.

    You probably want something like:

    use strict; use warnings; use HTML::TreeBuilder; my $data = do { local $/; <DATA> }; my $tree = HTML::TreeBuilder->new_from_content($data); my $pre_tag = $tree->look_down("_tag", "pre"); print $pre_tag->as_text(), "\n";

    Take a look at the the docs for Tree::Scanning for a basic tutorial and then have a look at the docs for HTML::Element since it is used alot by HTML::Treebuilder. HTML::Element has alot of methods that will be useful to you.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://302757]
Approved by castaway
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others cooling their heels in the Monastery: (13)
As of 2014-09-23 14:40 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    How do you remember the number of days in each month?











    Results (223 votes), past polls