Beefy Boxes and Bandwidth Generously Provided by pair Networks
Problems? Is your data what you think it is?

Parse HTML using HTML::TreeBuilder

by oldwarrior32 (Sexton)
on Oct 16, 2012 at 16:23 UTC ( #999365=perlquestion: print w/replies, xml ) Need Help??
oldwarrior32 has asked for the wisdom of the Perl Monks concerning the following question:

Hello Monks. I need some advice about parsing HTML code using HTML::TreeBuilder.

I have some HTML code, and I need some info within a table tag. There are, let's say 20 table tags, but the info requires is in table 15.

How do I know that in table 15 is the required info? Well, I just search for the info in notepad with ctrl+F and next I count the table tags from the beginning.

As you can see the process is very tedious.

The question is that HTML::Tree builder inherits a function from HTML::Element to dump the HTML code. The dumped HTML code looks like this:

td class="cuesTableFilterAreaTd"> @ <select id="searchField9" name="searchField9" on +change="getUtilityListValues(this, &quot;PhoneFindListForm&quot;, upd +ateUtilityList)" size="1"> @ <option selected value=""> @ +. "Device Name" <option value="device.description"> @ +. "Description" <option value="numplan.dnorpattern"> @ + "Directory Number" <option value=""> @0.1. + "Calling Search Space" <option value=""> @ + "Device Pool" <option value=""> @ +. "Device Type" <option value=""> @ +. "Call Pickup Group" <option value=""> @0 +. "LSC Status" <option value="device.authenticationString"> @ + "Authentication String" <option value=""> @0.1. + "Device Protocol" <option value=""> @ +. "Security Profile" <option value=""> @0.1. + "Common Device Configuration" <td class="cuesTableFilterAreaTd"> @ + <select id="searchLimit9" name="searchLimit9" si +ze="1"> @ <option selected value="beginsWith"> @ + "begins with" <option value="contains"> @ +.4.0.1 "contains" <option value="endsWith"> @ +.4.0.2 "ends with" <option value="isExactly"> @ + "is exactly" <option value="isEmpty"> @ +4.0.4 "is empty" <option value="isNotEmpty"> @ +. "is not empty" <td class="cuesTableFilterAreaTd"> @ + <input id="searchString9" name="searchString9" o +nkeypress="javascript:onEnterKey(event)" type="text" value="" /> @0.1 +. <td class="cuesTableFilterAreaTd"> @ + <td class="cuesTableFilterAreaTd"> @ + <td class="cuesTableFilterAreaTd"> @ + <td class="cuesTableFilterAreaTd"> @ + <td class="cuesTableFilterAreaTd"> @ +

You can see that for each line there is this: "@" or something. This tell you the position of the line in the tree.

So the question is, do you know a way to tell the module, hey I want to work from @, or another way to make the process I described simpler?

Thanks very much!

Replies are listed 'Best First'.
Re: Parse HTML using HTML::TreeBuilder
by daxim (Chaplain) on Oct 16, 2012 at 16:35 UTC
    Get a better/more declarative tool! Web::Query
    use Web::Query qw(); my $w = Web::Query->new_from_html($html); my $the_td_element_you_want = $w->find('td:contains("@")');
    From there, you can traverse to your actual, unmentioned destination in the DOM tree.
    edit: I misunderstood the HTML, thanks to Anonymonk below for pointing this out. oldwarrior32, please post your full HTML document, not a debug dump.
      That is funny, since the content doesn't include @ -- that is debugging dump -- use
Re: Parse HTML using HTML::TreeBuilder
by Anonymous Monk on Oct 16, 2012 at 16:51 UTC
Re: Parse HTML using HTML::TreeBuilder
by oldwarrior32 (Sexton) on Oct 17, 2012 at 16:29 UTC

    Thanks for your feedback.

    Currently I'm trying to improve what I'm doing with the module HTTP::TreeBuilder::XPath.

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://999365]
Approved by Corion
and all is quiet...

How do I use this? | Other CB clients
Other Users?
Others contemplating the Monastery: (4)
As of 2018-01-19 06:20 GMT
Find Nodes?
    Voting Booth?
    How did you see in the new year?

    Results (215 votes). Check out past polls.