Beefy Boxes and Bandwidth Generously Provided by pair Networks
There's more than one way to do things
 
PerlMonks  

Re^3: Web Scraping on CGI Scripts?

by tospo (Hermit)
on Oct 11, 2011 at 08:29 UTC ( #930762=note: print w/ replies, xml ) Need Help??


in reply to Re^2: Web Scraping on CGI Scripts?
in thread Web Scraping on CGI Scripts?

That page - apart from being marked-up in a rather old-fashioned way - isn't too bad at all. If you look at the page source code, you can easily see a table structure that you can use to parse it.
You will want to use a module like WWW::Mechanize to interact with the website. This moduel allows you to interact with web content like a user would in a browser. You can make your script "click" on links, to get to the text files. Use the table structure of the "browse" page to iterate over all the molecules, each time following the link through to the text data files.
Have a go with a simple example first. There are a few here. If you are getting stuck, post the script you have so far and what's happening so we can help you along. Good luck!


Comment on Re^3: Web Scraping on CGI Scripts?

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://930762]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others perusing the Monastery: (6)
As of 2014-12-22 06:41 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    Is guessing a good strategy for surviving in the IT business?





    Results (111 votes), past polls