Beefy Boxes and Bandwidth Generously Provided by pair Networks
P is for Practical
 
PerlMonks  

Re: Parsing HTML files to recover data...

by Anonymous Monk
on Nov 22, 2006 at 06:10 UTC ( #585445=note: print w/ replies, xml ) Need Help??


in reply to Parsing HTML files to recover data...

I have had great success scraping data out of html files using XML::LibXML. This will parse the html into a DOM tree and allow XPath searches for the data. While this may be overkill both for the learning curve or CPU cycles, the code required for coaxing the data out of the files will be pretty simple. You may also end up with a code that is easily changed to solve any similar problem.


Comment on Re: Parsing HTML files to recover data...

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://585445]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others making s'mores by the fire in the courtyard of the Monastery: (7)
As of 2014-12-25 06:43 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    Is guessing a good strategy for surviving in the IT business?





    Results (159 votes), past polls