Beefy Boxes and Bandwidth Generously Provided by pair Networks
We don't bite newbies here... much
 
PerlMonks  

Re: using the headers method of HTML::TableExtract to find an image

by kal (Hermit)
on Apr 02, 2001 at 17:19 UTC ( #68989=note: print w/replies, xml ) Need Help??


in reply to using the headers method of HTML::TableExtract to find an image

Forgive me, but I'm not exactly sure if I understand your question. If I haven't, try to rephrase - with examples, if possible.

Now, by my understanding, you're trying to pick out a table with a <img ..> tag in the <th..> tag? I've never tried this myself, but it's quite possible that it's only evaluating text nodes - that is, the tag is markup, not content, even if it has attributes. This is obvious, because <img ..> is an empty tag - in X/HTML, it would be written <img ../>, making it plain it contains no text nodes.

Probably the best way will be to write your own parser in HTML::Parser, or (better) extend HTML::TableExtract to make it possible to use 'nodes' (the tags :) and their attributes within the evaluation. Or, if you're dealing with XHTML, you could parse it using an XML::Parser, and then use XML::XPath to generate a query which would automatically find your answer! (Check out XPath if you haven't before - you can search through parsed XML trees for tags based on their name, their text content, their attributes, their lineage, etc. - sooper :) That's the preferred way, probably, but I suspect you're parsing someone else's web pages, so I guess it's probably not possible.

Have I made any sense??

  • Comment on Re: using the headers method of HTML::TableExtract to find an image

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://68989]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others studying the Monastery: (9)
As of 2020-07-03 11:11 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found

    Notices?