Beefy Boxes and Bandwidth Generously Provided by pair Networks
laziness, impatience, and hubris
 
PerlMonks  

fetching and storing data from web.

by nicolethomson (Initiate)
on Jan 27, 2012 at 06:53 UTC ( #950270=perlquestion: print w/ replies, xml ) Need Help??
nicolethomson has asked for the wisdom of the Perl Monks concerning the following question:

Fetching and Storing Information from Websites

Good day everyone

I am pretty fresher on this forum as well as perl, and new to coding too. but started learning these, on seeing my colleagues do in windows box's, fortunately i am in love with linux and installed LinuxMint in my computer.Now i am trying to do a small coding to fetch data from a website and store it in database.the data available in web is .htm file and there are around 35 .htm files with different heading and the data format as follows.

A Week 1

-----------------------------------------------------------------------------------------------

DAY-1 DAY-2 DAY-3 DAY-4 DAY-5

28/01 29/01 30/01 31/01 01/02

------------------- ------------------- ------------------- ------------------- ---------------

000 000 000 000 000

030 030 030 030 030

018 019 019 019 020

002 002 001 002 002

093 096 093 093 090

053 052 056 053 053

007 007 007 007 007

140 130 110 080 080

At present i am copying these info's manually into an excel file as of now, but wanted to do through script and a cron job it. My friend suggested me to do using some scripts and another one suggested perl is the best bet for the same. Can someone guide me?

Comment on fetching and storing data from web.
Re: fetching and storing data from web.
by GrandFather (Cardinal) on Jan 27, 2012 at 07:21 UTC

    For someone new to Perl and, particularly someone new to programming, you are tackling a fairly ambitious project. The first place you should visit is the Tutorials section and gain a little general Perl and programming knowledge. When you understand what a module is and how to use it come back and take a look at LWP::Simple, HTML::TableExtract (given it looks like you want to extract data from a table) and maybe Excel::Writer::XLSX or one of the other Excel modules to write the data out as an Excel file.

    True laziness is hard work

      Thanks Dear GrandFather

      Yes I am going through the tutorials

      I am going through HTML::TableExtract, as well

      Initially using bash script "wget", i am downloading those .htm files and storing in local folder

      but howto feed it to mysql or other sql

      Nic

        LWP::Simple does essentially the same job as wget but gets the data straight into Perl.

        DBI does database stuff, but needs a driver to work with so you need to pick the DBD module to match the database you are using. Use DBD::mysql for mysql, but if you have a choice I'd start with SQLite (DBD::SQLite) because it is stand alone and requires no set up. MySQL can be hard to get going on some systems.

        True laziness is hard work
        Wget is also available for Win32 therefore I have experience with it. I have to tell you that when the fetch is complicated ( https + authentication + cookies + form posting + many redirections ) it seems more stable and business like than the LWP module. For me it was much more difficult to get it right with the LWP module than with wget. In one particular case I could not get it right ( the login was a success - I know because the cookies generated were correct - but somewhere in the chain the redirection was lost and I failed to receive the welcome login page - the same worked with wget ).

        If it comes to a simple 'GET' the LWP module proved to be perfect for me (it almost never freezes, the timeout works all right).
Re: fetching and storing data from web.
by tospo (Hermit) on Jan 27, 2012 at 09:25 UTC
    I agree with the previous posts that this is an ambitious project for a beginner but don't let that stop you.
    Maybe start with trying to read some data from one of your already downloaded web page with Perl without using any additional modules, just to get a grip on the language.
    For example, just read up on how to read a file and how to use pattern matching (regular expressions) to fetch certain data from a file according to textual context. There are plenty of examples for that which you can use as a starting point. You would then write the results to a simple text file. Then maybe try to modify that so that your output is a proper CSV file that can already be opened in Excel. This can be done simply by printing your data with commas in between and quoting text. No need for an external module in most cases (although there are modules like Text::CSV that help you with the more complex cases).
    Once you can do that. Try to fetch the data directly from the web with LWP::Simple instead of reading from a file. First write a script that uses LWP::simple just to download the whole page and print everything to a local file. Then try to combine that with your parser and you are almost done.
    If you really want the data in a proper database you should learn basic SQL (database query language) and the Perl way of interacting with a database (the DBI or DBIc - too much to get into details here), but be prepared that that's not going to be done in one day.
    Keep going and good luck!!

      Thanks everyone

      partially i am doing things with awk/sed and bash commands ofcourse google did helped #sed -n -e 's/^[ ]*//g'  -e  's/\([0-9a-zA-Z\.]*\)  */\1 /g' -e 10p -e 15p -e 23p nic.htm > nic.txt  then i tried #perl -ne 'print;' *.txt > all.csv, but the result was not comfortable, then i tried with #for file in *.txt; do   cat "$file";   echo; done > newfile.csv  now .csv or .txt file gives me the result in readable format  from this text file, need to send it to database
      DISTRICT : ZUNHEBOTO STATE : ABC 02/02 03/02 04/02 05/02 06/02 speed 004 002 004 004 004... next line will be next paragraph
      DISTRICT : YUNHEBOOT STATE : EFG 02/02 03/02 04/02 05/02 06/02 speed 004 002 004 004 004
       is the result when i do cat of the same, in db i have created table and fields are STATE, District,date, speed  how to import it to db perl -MCPAN -e shell  and did the installation of HTML::PArser, for mysql what else i needto install here
        what exactly are you trying to achieve with
        perl -ne 'print;' *.txt > all.csv
        or
        for file in *.txt; do cat "$file"; echo; done > newfile.csv
        ??? Those are a bit pointless, just concatenating files into another file, same as just doing
        cat *.txt > newfile.txt
        And these are not magically creating csv format for you - it's just the same text as in the input files. No idea what you mean with "how to import it to db perl -MCPAN -e shell"???

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://950270]
Approved by GrandFather
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others taking refuge in the Monastery: (8)
As of 2014-10-21 03:49 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    For retirement, I am banking on:










    Results (95 votes), past polls