Beefy Boxes and Bandwidth Generously Provided by pair Networks vroom
Come for the quick hacks, stay for the epiphanies.
 
PerlMonks  

Re: Parsing HTML

by TheoPetersen (Priest)
on Mar 03, 2001 at 17:40 UTC ( [id://62028]=note: print w/replies, xml ) Need Help??

This is an archived low-energy page for bots and other anonmyous visitors. Please sign up if you are a human and want to interact.


in reply to Parsing HTML

When faced with this kind of task, a lot of Perl coders:
  • see that HTML is not that hard, and figure on parsing it manually;
  • find out that HTML is deceptive (or the person or process that writes the file writes lousy HTML) and figure on using a tool;
  • discover HTML::Parser, read the doc and say "that's too hard!"
  • go back to parsing it manually and come up with something that works as long as nothing ever changes.
At least, that's how me and my co-workers did it once :)

So as a result, I'd suggest looking at HTML::Parser or one of its relatives. I used HTML::TreeBuilder to parse some quite large and unreliable HTML files and found that it worked great. The tricky bit is learning how to code in the callback style required, but you can get lots of help on that here once you've started.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://62028]
help
Sections?
Information?
Find Nodes?
Leftovers?
    Notices?
    hippoepoptai's answer Re: how do I set a cookie and redirect was blessed by hippo!
    erzuuliAnonymous Monks are no longer allowed to use Super Search, due to an excessive use of this resource by robots.