http://www.perlmonks.org?node_id=673315

padawan_linuxero has asked for the wisdom of the Perl Monks concerning the following question:

Dear Monks!
I have a question for all of you, here in my work place, I need to take out a lot of data from several (like 7) tables from a FoxPro app in DOS, and put it in MySQL tables so I have done a little script to take out the data, the script work it does what is suppose to do, but it take a lot of time doing this operation. How can I make it faster??? this is the code:
unlink ('c:/proyecto/r501.csv'); #---- This part conects to my 2 databases one CSV and one MySql ----- my $dbhX1 = DBI->connect('dbi:XBase(RaiseError=1):'); my $dbhC1 = DBI->connect('dbi:CSV(RaiseError=1):'); my $select1 = $dbhX1->prepare("SELECT * FROM r501"); print "\nEjecutar SELECT de r501\n"; $select1->execute(); $dbhC1->do("CREATE TABLE r501.csv AS IMPORT(?)",{},$select1); print "\nconectar al a base de datos en Mysql\n"; my $mysql_dbh1 = DBI->connect("DBI:mysql:database=$datafilename;host=l +ocalhost", "root", "xyz123", {'RaiseError' => 1}); #---- Here I empty the table in MySql ------ print "Limpiando la tabla\n"; $mysql_dbh1->do("TRUNCATE TABLE r501"); #---- Here I load all the data ------- my $sql1 = "LOAD DATA LOCAL INFILE 'c:/proyecto/r501.csv' INTO TABLE r501 FIELDS TERMINATED BY ',' LINES TERMINATED BY '\n'"; my $sth1 = $mysql_dbh1->prepare($sql1); print "ejecutando la carga de la tabla\n"; $sth1->execute ();
is there a way I can make it faster because to do all the tables it take up to one hour to finish. Thank

Replies are listed 'Best First'.
Re: how can I make this faster
by igelkott (Priest) on Mar 10, 2008 at 20:51 UTC
    Why write to an intermediate file? I would try running this directly from one DB to the other.

    Is the truncate and complete dump really necessary? Maybe only a limited subset of the records need to be transfered (eg, most recent). Of course, this depends on the nature of your data but a good unique key should help.

    Unless your input data is quite well behaved, it seems a bit dangerous to rely on this basic interpretation of CSV in the data import. For example, do you need to support quoted strings with commas?

      Why write to an intermediate file? I would try running this directly from one DB to the other.

      That may or may not be a good idea, but it's worth testing to find out. The reason is that without the intermediate file, you have to use DBI and an ordinary prepared INSERT statement to load the data. With it, you can use the DB's native bulk import facility.

      That said, it may be possible to gain speed by breaking the data into chunks and using a pipeline approach, ie. generate the CSV file for the next batch while the bulk loader is still handling the previous one. This is assuming MySQL's bulk loader has an append capability; I'm not at all familiar with it. You can also try creating the new table without indexes and adding them only later.

      Is the truncate and complete dump really necessary? Maybe only a limited subset of the records need to be transfered (eg, most recent). Of course, this depends on the nature of your data but a good unique key should help.

      Agreed. I've often found substantial improvements in this sort of application by making people think through this issue instead of blindly loading all the data all the time.

      Unless your input data is quite well behaved, it seems a bit dangerous to rely on this basic interpretation of CSV in the data import. For example, do you need to support quoted strings with commas?

      Again, I don't know MySQL's bulk loader, but as long as it can read the same sort of CSV that DBD::CSV spits out, this should be fine.

      Another thought for the OP is that you might want to benchmark the different CSV modules in terms of finding the fastest way to generate the output. IOW, using fetchrow_arrayref with an appropriate maxrows and writing the data using Text::CSV_XS may be faster than DBD::CSV. Then again, it may not. Finally, if you know your data cannot contain "funny" characters like quotes, commas, or newlines, you can skip these modules altogether and use print.

Re: how can I make this faster
by CountZero (Bishop) on Mar 10, 2008 at 22:37 UTC
    Before your question can be answered, first some measurements must be made: Where exactly is your script "slow"?

    Is it in reading the XBase data or in writing it to the csv-file? Dumping the data into MySQL should be rather fast unless of course the csv file is so big you are getting memory problems.

    If we are speaking of really huge amounts of data, it is probably better to load the data record by record and send them straight from the XBase database to the MySQL database. Shifting around the data through intermediary files will make your script IO-bound and as everything has to go back and forth several times on the same harddisk, you create a bottleneck in the IO.

    CountZero

    A program should be light and agile, its subroutines connected like a string of pearls. The spirit and intent of the program should be retained throughout. There should be neither too little or too much, neither needless loops nor useless variables, neither lack of structure nor overwhelming rigidity." - The Tao of Programming, 4.1 - Geoffrey James