Beefy Boxes and Bandwidth Generously Provided by pair Networks
Clear questions and runnable code
get the best and fastest answer

Re: Parsing HTML files

by chrestomanci (Priest)
on Nov 18, 2010 at 09:22 UTC ( #872162=note: print w/replies, xml ) Need Help??

in reply to Parsing HTML files

Adding to the suggestion from Your Mother suggesting the use of HTML::TreeBuilder

I have found it useful in the past to use a GUI HTML tree inspector such as Firebug, or the inspect element tool in google chrome.

Using such a tool will quickly tell you how the element you are interested in sits within the HTML structure, and will quickly tell you about the div and other useful tags that are above it in the html tree.

Contrary to what Tux said, I have not found that changing structure is much of a problem, because the the html code of most big websites these days is generated out of CMS databases by computer programs, so the structure tends to be very consistent. Occasionally a site will have a major re-design, but the rest of the time the sites are very stable. I guess the situation is different if you are dealing with hand created html on small websites.

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://872162]
and all is quiet...

How do I use this? | Other CB clients
Other Users?
Others taking refuge in the Monastery: (5)
As of 2018-06-20 06:41 GMT
Find Nodes?
    Voting Booth?
    Should cpanminus be part of the standard Perl release?

    Results (116 votes). Check out past polls.