http://www.perlmonks.org?node_id=68989


in reply to using the headers method of HTML::TableExtract to find an image

Forgive me, but I'm not exactly sure if I understand your question. If I haven't, try to rephrase - with examples, if possible.

Now, by my understanding, you're trying to pick out a table with a <img ..> tag in the <th..> tag? I've never tried this myself, but it's quite possible that it's only evaluating text nodes - that is, the tag is markup, not content, even if it has attributes. This is obvious, because <img ..> is an empty tag - in X/HTML, it would be written <img ../>, making it plain it contains no text nodes.

Probably the best way will be to write your own parser in HTML::Parser, or (better) extend HTML::TableExtract to make it possible to use 'nodes' (the tags :) and their attributes within the evaluation. Or, if you're dealing with XHTML, you could parse it using an XML::Parser, and then use XML::XPath to generate a query which would automatically find your answer! (Check out XPath if you haven't before - you can search through parsed XML trees for tags based on their name, their text content, their attributes, their lineage, etc. - sooper :) That's the preferred way, probably, but I suspect you're parsing someone else's web pages, so I guess it's probably not possible.

Have I made any sense??

  • Comment on Re: using the headers method of HTML::TableExtract to find an image