Beefy Boxes and Bandwidth Generously Provided by pair Networks
good chemistry is complicated,
and a little bit messy -LW
 
PerlMonks  

Re: Parsing HTML files to recover data...

by blue_cowdawg (Monsignor)
on Nov 21, 2006 at 19:22 UTC ( #585334=note: print w/ replies, xml ) Need Help??


in reply to Parsing HTML files to recover data...

      My question to you all is can this be done with a fancy regex or is there a module on Cpan that I missed that does this already? If so could you kindly point me in the right direction?

Take a look at this CUFP I posted a while back for some insight on how to parse HTML and extract data from it. In it I use HTML::TableContentParser and LWP::UserAgent to pull in HTML extract data from tables and trigger alarms based on that data.

Similarly you could use HTML::TokeParser to do much the same sorts of things with your <blockquote>...</blockquote> HTML syntax above.

If you have time to do some reading take a look at the book Web, Graphics, & Perl/TK published by O'Reilly or Perl & LWP also published by O'Reilly. The latter being more my favorite on the subjects at hand.


Peter L. Berghold -- Unix Professional
Peter -at- Berghold -dot- Net; AOL IM redcowdawg Yahoo IM: blue_cowdawg


Comment on Re: Parsing HTML files to recover data...
Download Code
Replies are listed 'Best First'.
Re^2: Parsing HTML files to recover data...
by UrbanHick (Sexton) on Nov 22, 2006 at 01:34 UTC

    Thank you blue_cowdawg! These links look extremely promising. Getting my hands on the O'Reilly books might prove a bit difficult but the HTML::TokeParser looks very interesting indeed.

    -UH

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://585334]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others having an uproarious good time at the Monastery: (14)
As of 2015-07-31 18:21 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    The top three priorities of my open tasks are (in descending order of likelihood to be worked on) ...









    Results (280 votes), past polls