Beefy Boxes and Bandwidth Generously Provided by pair Networks
Do you know where your variables are?
 
PerlMonks  

Parse HTML using HTML::TreeBuilder

by oldwarrior32 (Sexton)
on Oct 16, 2012 at 16:23 UTC ( #999365=perlquestion: print w/ replies, xml ) Need Help??
oldwarrior32 has asked for the wisdom of the Perl Monks concerning the following question:

Hello Monks. I need some advice about parsing HTML code using HTML::TreeBuilder.

I have some HTML code, and I need some info within a table tag. There are, let's say 20 table tags, but the info requires is in table 15.

How do I know that in table 15 is the required info? Well, I just search for the info in notepad with ctrl+F and next I count the table tags from the beginning.

As you can see the process is very tedious.

The question is that HTML::Tree builder inherits a function from HTML::Element to dump the HTML code. The dumped HTML code looks like this:

td class="cuesTableFilterAreaTd"> @0.1.9.0.0.0.0.4.1.9.3 <select id="searchField9" name="searchField9" on +change="getUtilityListValues(this, &quot;PhoneFindListForm&quot;, upd +ateUtilityList)" size="1"> @0.1.9.0.0.0.0.4.1.9.3.0 <option selected value="device.name"> @0.1.9.0 +.0.0.0.4.1.9.3.0.0 "Device Name" <option value="device.description"> @0.1.9.0.0 +.0.0.4.1.9.3.0.1 "Description" <option value="numplan.dnorpattern"> @0.1.9.0. +0.0.0.4.1.9.3.0.2 "Directory Number" <option value="callingsearchspace.name"> @0.1. +9.0.0.0.0.4.1.9.3.0.3 "Calling Search Space" <option value="devicepool.name"> @0.1.9.0.0.0. +0.4.1.9.3.0.4 "Device Pool" <option value="TypeProduct.name"> @0.1.9.0.0.0 +.0.4.1.9.3.0.5 "Device Type" <option value="pickupgroup.name"> @0.1.9.0.0.0 +.0.4.1.9.3.0.6 "Call Pickup Group" <option value="TypeCertificateStatus.name"> @0 +.1.9.0.0.0.0.4.1.9.3.0.7 "LSC Status" <option value="device.authenticationString"> @ +0.1.9.0.0.0.0.4.1.9.3.0.8 "Authentication String" <option value="TypeDeviceProtocol.name"> @0.1. +9.0.0.0.0.4.1.9.3.0.9 "Device Protocol" <option value="securityprofile.name"> @0.1.9.0 +.0.0.0.4.1.9.3.0.10 "Security Profile" <option value="commondeviceconfig.name"> @0.1. +9.0.0.0.0.4.1.9.3.0.11 "Common Device Configuration" <td class="cuesTableFilterAreaTd"> @0.1.9.0.0.0.0. +4.1.9.4 <select id="searchLimit9" name="searchLimit9" si +ze="1"> @0.1.9.0.0.0.0.4.1.9.4.0 <option selected value="beginsWith"> @0.1.9.0. +0.0.0.4.1.9.4.0.0 "begins with" <option value="contains"> @0.1.9.0.0.0.0.4.1.9 +.4.0.1 "contains" <option value="endsWith"> @0.1.9.0.0.0.0.4.1.9 +.4.0.2 "ends with" <option value="isExactly"> @0.1.9.0.0.0.0.4.1. +9.4.0.3 "is exactly" <option value="isEmpty"> @0.1.9.0.0.0.0.4.1.9. +4.0.4 "is empty" <option value="isNotEmpty"> @0.1.9.0.0.0.0.4.1 +.9.4.0.5 "is not empty" <td class="cuesTableFilterAreaTd"> @0.1.9.0.0.0.0. +4.1.9.5 <input id="searchString9" name="searchString9" o +nkeypress="javascript:onEnterKey(event)" type="text" value="" /> @0.1 +.9.0.0.0.0.4.1.9.5.0 <td class="cuesTableFilterAreaTd"> @0.1.9.0.0.0.0. +4.1.9.6 <td class="cuesTableFilterAreaTd"> @0.1.9.0.0.0.0. +4.1.9.7 <td class="cuesTableFilterAreaTd"> @0.1.9.0.0.0.0. +4.1.9.8 <td class="cuesTableFilterAreaTd"> @0.1.9.0.0.0.0. +4.1.9.9 <td class="cuesTableFilterAreaTd"> @0.1.9.0.0.0.0. +4.1.9.10

You can see that for each line there is this: "@0.1.9.0.0.0.0.4.1.9.10" or something. This tell you the position of the line in the tree.

So the question is, do you know a way to tell the module, hey I want to work from @0.1.9.0.0.0.0.4.1.9.10, or another way to make the process I described simpler?

Thanks very much!

Comment on Parse HTML using HTML::TreeBuilder
Download Code
Re: Parse HTML using HTML::TreeBuilder
by daxim (Chaplain) on Oct 16, 2012 at 16:35 UTC
    Get a better/more declarative tool! Web::Query
    use Web::Query qw(); my $w = Web::Query->new_from_html($html); my $the_td_element_you_want = $w->find('td:contains("@0.1.9.0.0.0.0.4.1.9.10")');
    From there, you can traverse to your actual, unmentioned destination in the DOM tree.
    edit: I misunderstood the HTML, thanks to Anonymonk below for pointing this out. oldwarrior32, please post your full HTML document, not a debug dump.
      That is funny, since the content doesn't include @ -- that is debugging dump -- use htmltreexpather.pl
Re: Parse HTML using HTML::TreeBuilder
by Anonymous Monk on Oct 16, 2012 at 16:51 UTC
Re: Parse HTML using HTML::TreeBuilder
by oldwarrior32 (Sexton) on Oct 17, 2012 at 16:29 UTC

    Thanks for your feedback.

    Currently I'm trying to improve what I'm doing with the module HTTP::TreeBuilder::XPath.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://999365]
Approved by Corion
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others examining the Monastery: (9)
As of 2014-10-21 10:45 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    For retirement, I am banking on:










    Results (100 votes), past polls