Beefy Boxes and Bandwidth Generously Provided by pair Networks
Don't ask to ask, just ask
 
PerlMonks  

Re^2: how can I make this faster

by Errto (Vicar)
on Mar 10, 2008 at 22:42 UTC ( #673375=note: print w/ replies, xml ) Need Help??


in reply to Re: how can I make this faster
in thread how can I make this faster

Why write to an intermediate file? I would try running this directly from one DB to the other.

That may or may not be a good idea, but it's worth testing to find out. The reason is that without the intermediate file, you have to use DBI and an ordinary prepared INSERT statement to load the data. With it, you can use the DB's native bulk import facility.

That said, it may be possible to gain speed by breaking the data into chunks and using a pipeline approach, ie. generate the CSV file for the next batch while the bulk loader is still handling the previous one. This is assuming MySQL's bulk loader has an append capability; I'm not at all familiar with it. You can also try creating the new table without indexes and adding them only later.

Is the truncate and complete dump really necessary? Maybe only a limited subset of the records need to be transfered (eg, most recent). Of course, this depends on the nature of your data but a good unique key should help.

Agreed. I've often found substantial improvements in this sort of application by making people think through this issue instead of blindly loading all the data all the time.

Unless your input data is quite well behaved, it seems a bit dangerous to rely on this basic interpretation of CSV in the data import. For example, do you need to support quoted strings with commas?

Again, I don't know MySQL's bulk loader, but as long as it can read the same sort of CSV that DBD::CSV spits out, this should be fine.

Another thought for the OP is that you might want to benchmark the different CSV modules in terms of finding the fastest way to generate the output. IOW, using fetchrow_arrayref with an appropriate maxrows and writing the data using Text::CSV_XS may be faster than DBD::CSV. Then again, it may not. Finally, if you know your data cannot contain "funny" characters like quotes, commas, or newlines, you can skip these modules altogether and use print.


Comment on Re^2: how can I make this faster
Select or Download Code

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://673375]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others imbibing at the Monastery: (11)
As of 2014-08-27 20:48 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    The best computer themed movie is:











    Results (252 votes), past polls