http://www.perlmonks.org?node_id=872162


in reply to Parsing HTML files

Adding to the suggestion from Your Mother suggesting the use of HTML::TreeBuilder

I have found it useful in the past to use a GUI HTML tree inspector such as Firebug, or the inspect element tool in google chrome.

Using such a tool will quickly tell you how the element you are interested in sits within the HTML structure, and will quickly tell you about the div and other useful tags that are above it in the html tree.

Contrary to what Tux said, I have not found that changing structure is much of a problem, because the the html code of most big websites these days is generated out of CMS databases by computer programs, so the structure tends to be very consistent. Occasionally a site will have a major re-design, but the rest of the time the sites are very stable. I guess the situation is different if you are dealing with hand created html on small websites.