Beefy Boxes and Bandwidth Generously Provided by pair Networks
Syntactic Confectionery Delight
 
PerlMonks  

Re: Parsing HTML files

by chrestomanci (Priest)
on Nov 18, 2010 at 09:22 UTC ( #872162=note: print w/ replies, xml ) Need Help??


in reply to Parsing HTML files

Adding to the suggestion from Your Mother suggesting the use of HTML::TreeBuilder

I have found it useful in the past to use a GUI HTML tree inspector such as Firebug, or the inspect element tool in google chrome.

Using such a tool will quickly tell you how the element you are interested in sits within the HTML structure, and will quickly tell you about the div and other useful tags that are above it in the html tree.

Contrary to what Tux said, I have not found that changing structure is much of a problem, because the the html code of most big websites these days is generated out of CMS databases by computer programs, so the structure tends to be very consistent. Occasionally a site will have a major re-design, but the rest of the time the sites are very stable. I guess the situation is different if you are dealing with hand created html on small websites.


Comment on Re: Parsing HTML files

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://872162]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others perusing the Monastery: (9)
As of 2014-08-22 05:02 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    The best computer themed movie is:











    Results (147 votes), past polls