Beefy Boxes and Bandwidth Generously Provided by pair Networks
We don't bite newbies here... much
 
PerlMonks  

Re: Parsing HTML files to recover data...

by blue_cowdawg (Monsignor)
on Nov 21, 2006 at 19:22 UTC ( #585334=note: print w/ replies, xml ) Need Help??


in reply to Parsing HTML files to recover data...

      My question to you all is can this be done with a fancy regex or is there a module on Cpan that I missed that does this already? If so could you kindly point me in the right direction?

Take a look at this CUFP I posted a while back for some insight on how to parse HTML and extract data from it. In it I use HTML::TableContentParser and LWP::UserAgent to pull in HTML extract data from tables and trigger alarms based on that data.

Similarly you could use HTML::TokeParser to do much the same sorts of things with your <blockquote>...</blockquote> HTML syntax above.

If you have time to do some reading take a look at the book Web, Graphics, & Perl/TK published by O'Reilly or Perl & LWP also published by O'Reilly. The latter being more my favorite on the subjects at hand.


Peter L. Berghold -- Unix Professional
Peter -at- Berghold -dot- Net; AOL IM redcowdawg Yahoo IM: blue_cowdawg


Comment on Re: Parsing HTML files to recover data...
Download Code
Re^2: Parsing HTML files to recover data...
by UrbanHick (Sexton) on Nov 22, 2006 at 01:34 UTC

    Thank you blue_cowdawg! These links look extremely promising. Getting my hands on the O'Reilly books might prove a bit difficult but the HTML::TokeParser looks very interesting indeed.

    -UH

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://585334]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others chilling in the Monastery: (9)
As of 2014-12-29 07:43 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    Is guessing a good strategy for surviving in the IT business?





    Results (185 votes), past polls