Beefy Boxes and Bandwidth Generously Provided by pair Networks
Problems? Is your data what you think it is?
 
PerlMonks  

HTML::Tree: get value of an element

by Ratazong (Monsignor)
on Feb 19, 2015 at 14:22 UTC ( [id://1117222]=perlquestion: print w/replies, xml ) Need Help??

Ratazong has asked for the wisdom of the Perl Monks concerning the following question:

Dear Monks!

I'm playing with html-parsing (using HTML::Tree), and am stuck with parsing the following snippet:

<span class="name color1">ImportantText1</span> <span class="name color2"><span class="level">1000</span>ImportantText +2</span>
I try to parse this with the code below (just a part of the script to show you the idea), and finally want to have the ImportantText1 or ImportantText2 in the variable $name. However I end with the value 1000ImportantText2 in the second case. And somehow I cannot find the part of the documentation which shows me how to replace the as_trimmed_text() so the values of the sub-elements are ignored.

Please enlighten me!

Rata

my @subsub = $body->look_down (_tag => "span"); foreach my $sss (@subsub) { my $c = $sss->attr("class"); if ($c =~ /name color/) { $name = $sss->as_trimmed_text(); } }

Replies are listed 'Best First'.
Re: HTML::Tree: get value of an element
by poj (Abbot) on Feb 19, 2015 at 16:26 UTC
    Try
    #!perl use strict; use HTML::TreeBuilder; my $body = HTML::TreeBuilder->new_from_file(\*DATA); my @subsub = $body->look_down (_tag => "span"); foreach my $sss (@subsub) { my $c = $sss->attr("class"); if ($c =~ /name color/){ # ignore sub elements for ( $sss->descendants() ){ $_->detach(); }; my $name = $sss->as_trimmed_text(); print $name."\n"; } } __DATA__ <span class="name color1">ImportantText1</span> <span class="name color2"><span class="level">1000</span>ImportantText +2</span>
    poj

      Thanks for your suggestion! I tested it, and it works fine. :-)

      Nevertheless, I solved the problem another way (as I don't like the idea of destroying my tree): I now additionally create the texts for the sub-elements, and then remove them from the top-element.

      This solution also doesn't make me happy (it is somehow waste) - however I don't have any problems with runtime or memory-usage now, so it doesn't hurt. And the additional amount of electrons moving around heats up the air in the room - which is not to bad in winter ;-)

      Rata

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://1117222]
Approved by Athanasius
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others cooling their heels in the Monastery: (3)
As of 2024-04-20 04:08 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found