Beefy Boxes and Bandwidth Generously Provided by pair Networks
good chemistry is complicated,
and a little bit messy -LW
 
PerlMonks  

Parse HTML using HTML::TreeBuilder

by oldwarrior32 (Sexton)
on Oct 16, 2012 at 16:23 UTC ( #999365=perlquestion: print w/ replies, xml ) Need Help??
oldwarrior32 has asked for the wisdom of the Perl Monks concerning the following question:

Hello Monks. I need some advice about parsing HTML code using HTML::TreeBuilder.

I have some HTML code, and I need some info within a table tag. There are, let's say 20 table tags, but the info requires is in table 15.

How do I know that in table 15 is the required info? Well, I just search for the info in notepad with ctrl+F and next I count the table tags from the beginning.

As you can see the process is very tedious.

The question is that HTML::Tree builder inherits a function from HTML::Element to dump the HTML code. The dumped HTML code looks like this:

td class="cuesTableFilterAreaTd"> @0.1.9.0.0.0.0.4.1.9.3 <select id="searchField9" name="searchField9" on +change="getUtilityListValues(this, &quot;PhoneFindListForm&quot;, upd +ateUtilityList)" size="1"> @0.1.9.0.0.0.0.4.1.9.3.0 <option selected value="device.name"> @0.1.9.0 +.0.0.0.4.1.9.3.0.0 "Device Name" <option value="device.description"> @0.1.9.0.0 +.0.0.4.1.9.3.0.1 "Description" <option value="numplan.dnorpattern"> @0.1.9.0. +0.0.0.4.1.9.3.0.2 "Directory Number" <option value="callingsearchspace.name"> @0.1. +9.0.0.0.0.4.1.9.3.0.3 "Calling Search Space" <option value="devicepool.name"> @0.1.9.0.0.0. +0.4.1.9.3.0.4 "Device Pool" <option value="TypeProduct.name"> @0.1.9.0.0.0 +.0.4.1.9.3.0.5 "Device Type" <option value="pickupgroup.name"> @0.1.9.0.0.0 +.0.4.1.9.3.0.6 "Call Pickup Group" <option value="TypeCertificateStatus.name"> @0 +.1.9.0.0.0.0.4.1.9.3.0.7 "LSC Status" <option value="device.authenticationString"> @ +0.1.9.0.0.0.0.4.1.9.3.0.8 "Authentication String" <option value="TypeDeviceProtocol.name"> @0.1. +9.0.0.0.0.4.1.9.3.0.9 "Device Protocol" <option value="securityprofile.name"> @0.1.9.0 +.0.0.0.4.1.9.3.0.10 "Security Profile" <option value="commondeviceconfig.name"> @0.1. +9.0.0.0.0.4.1.9.3.0.11 "Common Device Configuration" <td class="cuesTableFilterAreaTd"> @0.1.9.0.0.0.0. +4.1.9.4 <select id="searchLimit9" name="searchLimit9" si +ze="1"> @0.1.9.0.0.0.0.4.1.9.4.0 <option selected value="beginsWith"> @0.1.9.0. +0.0.0.4.1.9.4.0.0 "begins with" <option value="contains"> @0.1.9.0.0.0.0.4.1.9 +.4.0.1 "contains" <option value="endsWith"> @0.1.9.0.0.0.0.4.1.9 +.4.0.2 "ends with" <option value="isExactly"> @0.1.9.0.0.0.0.4.1. +9.4.0.3 "is exactly" <option value="isEmpty"> @0.1.9.0.0.0.0.4.1.9. +4.0.4 "is empty" <option value="isNotEmpty"> @0.1.9.0.0.0.0.4.1 +.9.4.0.5 "is not empty" <td class="cuesTableFilterAreaTd"> @0.1.9.0.0.0.0. +4.1.9.5 <input id="searchString9" name="searchString9" o +nkeypress="javascript:onEnterKey(event)" type="text" value="" /> @0.1 +.9.0.0.0.0.4.1.9.5.0 <td class="cuesTableFilterAreaTd"> @0.1.9.0.0.0.0. +4.1.9.6 <td class="cuesTableFilterAreaTd"> @0.1.9.0.0.0.0. +4.1.9.7 <td class="cuesTableFilterAreaTd"> @0.1.9.0.0.0.0. +4.1.9.8 <td class="cuesTableFilterAreaTd"> @0.1.9.0.0.0.0. +4.1.9.9 <td class="cuesTableFilterAreaTd"> @0.1.9.0.0.0.0. +4.1.9.10

You can see that for each line there is this: "@0.1.9.0.0.0.0.4.1.9.10" or something. This tell you the position of the line in the tree.

So the question is, do you know a way to tell the module, hey I want to work from @0.1.9.0.0.0.0.4.1.9.10, or another way to make the process I described simpler?

Thanks very much!

Comment on Parse HTML using HTML::TreeBuilder
Download Code
Re: Parse HTML using HTML::TreeBuilder
by daxim (Chaplain) on Oct 16, 2012 at 16:35 UTC
    Get a better/more declarative tool! Web::Query
    use Web::Query qw(); my $w = Web::Query->new_from_html($html); my $the_td_element_you_want = $w->find('td:contains("@0.1.9.0.0.0.0.4.1.9.10")');
    From there, you can traverse to your actual, unmentioned destination in the DOM tree.
    edit: I misunderstood the HTML, thanks to Anonymonk below for pointing this out. oldwarrior32, please post your full HTML document, not a debug dump.
      That is funny, since the content doesn't include @ -- that is debugging dump -- use htmltreexpather.pl
Re: Parse HTML using HTML::TreeBuilder
by Anonymous Monk on Oct 16, 2012 at 16:51 UTC
Re: Parse HTML using HTML::TreeBuilder
by oldwarrior32 (Sexton) on Oct 17, 2012 at 16:29 UTC

    Thanks for your feedback.

    Currently I'm trying to improve what I'm doing with the module HTTP::TreeBuilder::XPath.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://999365]
Approved by Corion
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others avoiding work at the Monastery: (4)
As of 2015-07-04 02:59 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    The top three priorities of my open tasks are (in descending order of likelihood to be worked on) ...









    Results (57 votes), past polls