http://www.perlmonks.org?node_id=673375


in reply to Re: how can I make this faster
in thread how can I make this faster

Why write to an intermediate file? I would try running this directly from one DB to the other.

That may or may not be a good idea, but it's worth testing to find out. The reason is that without the intermediate file, you have to use DBI and an ordinary prepared INSERT statement to load the data. With it, you can use the DB's native bulk import facility.

That said, it may be possible to gain speed by breaking the data into chunks and using a pipeline approach, ie. generate the CSV file for the next batch while the bulk loader is still handling the previous one. This is assuming MySQL's bulk loader has an append capability; I'm not at all familiar with it. You can also try creating the new table without indexes and adding them only later.

Is the truncate and complete dump really necessary? Maybe only a limited subset of the records need to be transfered (eg, most recent). Of course, this depends on the nature of your data but a good unique key should help.

Agreed. I've often found substantial improvements in this sort of application by making people think through this issue instead of blindly loading all the data all the time.

Unless your input data is quite well behaved, it seems a bit dangerous to rely on this basic interpretation of CSV in the data import. For example, do you need to support quoted strings with commas?

Again, I don't know MySQL's bulk loader, but as long as it can read the same sort of CSV that DBD::CSV spits out, this should be fine.

Another thought for the OP is that you might want to benchmark the different CSV modules in terms of finding the fastest way to generate the output. IOW, using fetchrow_arrayref with an appropriate maxrows and writing the data using Text::CSV_XS may be faster than DBD::CSV. Then again, it may not. Finally, if you know your data cannot contain "funny" characters like quotes, commas, or newlines, you can skip these modules altogether and use print.