Beefy Boxes and Bandwidth Generously Provided by pair Networks
There's more than one way to do things
 
PerlMonks  

Help with HTML::TreeBuilder::XPath

by edimusrex (Monk)
on Jul 08, 2016 at 17:58 UTC ( #1167474=perlquestion: print w/replies, xml ) Need Help??

edimusrex has asked for the wisdom of the Perl Monks concerning the following question:

I am having a slight issue using the HTML::TreeBuilder::XPath module. I have a block of HTML as such (really doesn't matter the info here, there are thousands in the same format)
<div id="filerDiv"> <div class="mailer">Mailing Address <span class="mailerAddress">65 MARKET STREET, SUITE 1207,</spa +n> <span class="mailerAddress">CAMANA BAY, P.O. BOX 31110</span> <span class="mailerAddress">GRAND CAYMAN E9 KY1-1205</span> </div> <div class="mailer">Business Address <span class="mailerAddress">65 MARKET STREET, SUITE 1207,</span> <span class="mailerAddress">CAMANA BAY, P.O. BOX 31110</span> <span class="mailerAddress">GRAND CAYMAN E9 KY1-1205</span> <span class="mailerAddress">345 943 4573</span> </div> <div class="companyInfo"> <span class="companyName">GREENLIGHT CAPITAL RE, LTD. (Filer) <acronym title="Central Index Key">CIK</acronym>: <a href="/cg +i-bin/browse-edgar?CIK=0001385613&amp;action=getcompany">0001385613 ( +see all company filings)</a></span> <p class="identInfo"><acronym title="Internal Revenue Service +Number">IRS No.</acronym>: <strong>000000000</strong><br />Type: <str +ong>10-Q</strong> | Act: <strong>34</strong> | File No.: <a href="/cg +i-bin/browse-edgar?filenum=001-33493&amp;action=getcompany"><strong>0 +01-33493</strong></a> | Film No.: <strong>161612131</strong><br /><ac +ronym title="Standard Industrial Code">SIC</acronym>: <b><a href="/cg +i-bin/browse-edgar?action=getcompany&amp;SIC=6331&amp;owner=include"> +6331</a></b> Fire, Marine &amp; Casualty Insurance<br />Assistant Dir +ector 1</p> </div> </div>
What I need it to do is for the second div with the class "mailer" I need to get the text information from the spans within the block. I have been messing around with it for a while now but I can only ever get all the text into one line. I would like to be able to store each span individually in an array, so in this instance there are 4 spans, I would like there to be 4 array elements. Here is a snippet of the code I am using to parse the file.
my $root = HTML::TreeBuilder::XPath->new; $root->parse($content); my @Baddress = $root->findvalue('//div[@id="filerDiv"]/div[@class= +"mailer"][2]/span/text()');
Any kind of help would be greatly appreciated.
Update

I figured it out. I was using $root->findvalue and not $root->findvalues so evrything was being assigned to 1 variable. Thanks for reading

Replies are listed 'Best First'.
Re: Help with HTML::TreeBuilder::XPath
by poj (Abbot) on Jul 08, 2016 at 19:10 UTC
    my @Baddress = $root->findvalues( .. ) # add s ^
    poj

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://1167474]
Approved by marto
Front-paged by Corion
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others examining the Monastery: (4)
As of 2021-09-28 14:43 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found

    Notices?