Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl Monk, Perl Meditation
 
PerlMonks  

Re: fetching and storing data from web.

by GrandFather (Cardinal)
on Jan 27, 2012 at 07:21 UTC ( #950272=note: print w/ replies, xml ) Need Help??


in reply to fetching and storing data from web.

For someone new to Perl and, particularly someone new to programming, you are tackling a fairly ambitious project. The first place you should visit is the Tutorials section and gain a little general Perl and programming knowledge. When you understand what a module is and how to use it come back and take a look at LWP::Simple, HTML::TableExtract (given it looks like you want to extract data from a table) and maybe Excel::Writer::XLSX or one of the other Excel modules to write the data out as an Excel file.

True laziness is hard work


Comment on Re: fetching and storing data from web.
Re^2: fetching and storing data from web.
by nicolethomson (Initiate) on Jan 27, 2012 at 07:50 UTC

    Thanks Dear GrandFather

    Yes I am going through the tutorials

    I am going through HTML::TableExtract, as well

    Initially using bash script "wget", i am downloading those .htm files and storing in local folder

    but howto feed it to mysql or other sql

    Nic

      LWP::Simple does essentially the same job as wget but gets the data straight into Perl.

      DBI does database stuff, but needs a driver to work with so you need to pick the DBD module to match the database you are using. Use DBD::mysql for mysql, but if you have a choice I'd start with SQLite (DBD::SQLite) because it is stand alone and requires no set up. MySQL can be hard to get going on some systems.

      True laziness is hard work
      Wget is also available for Win32 therefore I have experience with it. I have to tell you that when the fetch is complicated ( https + authentication + cookies + form posting + many redirections ) it seems more stable and business like than the LWP module. For me it was much more difficult to get it right with the LWP module than with wget. In one particular case I could not get it right ( the login was a success - I know because the cookies generated were correct - but somewhere in the chain the redirection was lost and I failed to receive the welcome login page - the same worked with wget ).

      If it comes to a simple 'GET' the LWP module proved to be perfect for me (it almost never freezes, the timeout works all right).
        In cases like this where you essentially need to simulate a browser visiting the website, you can also use WWW::Mechanize which does exactly that. You can even control a real web browser through Perl modules to get exactly the interaction you would have with a website when you use it through your browser (WWW::Mechanize::Firefox).

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://950272]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others romping around the Monastery: (5)
As of 2014-10-24 10:27 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    For retirement, I am banking on:










    Results (131 votes), past polls