Beefy Boxes and Bandwidth Generously Provided by pair Networks
Keep It Simple, Stupid

comment on

( #3333=superdoc: print w/replies, xml ) Need Help??
Why write to an intermediate file? I would try running this directly from one DB to the other.

That may or may not be a good idea, but it's worth testing to find out. The reason is that without the intermediate file, you have to use DBI and an ordinary prepared INSERT statement to load the data. With it, you can use the DB's native bulk import facility.

That said, it may be possible to gain speed by breaking the data into chunks and using a pipeline approach, ie. generate the CSV file for the next batch while the bulk loader is still handling the previous one. This is assuming MySQL's bulk loader has an append capability; I'm not at all familiar with it. You can also try creating the new table without indexes and adding them only later.

Is the truncate and complete dump really necessary? Maybe only a limited subset of the records need to be transfered (eg, most recent). Of course, this depends on the nature of your data but a good unique key should help.

Agreed. I've often found substantial improvements in this sort of application by making people think through this issue instead of blindly loading all the data all the time.

Unless your input data is quite well behaved, it seems a bit dangerous to rely on this basic interpretation of CSV in the data import. For example, do you need to support quoted strings with commas?

Again, I don't know MySQL's bulk loader, but as long as it can read the same sort of CSV that DBD::CSV spits out, this should be fine.

Another thought for the OP is that you might want to benchmark the different CSV modules in terms of finding the fastest way to generate the output. IOW, using fetchrow_arrayref with an appropriate maxrows and writing the data using Text::CSV_XS may be faster than DBD::CSV. Then again, it may not. Finally, if you know your data cannot contain "funny" characters like quotes, commas, or newlines, you can skip these modules altogether and use print.

In reply to Re^2: how can I make this faster by Errto
in thread how can I make this faster by padawan_linuxero

Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post; it's "PerlMonks-approved HTML":

  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.
  • Log In?

    What's my password?
    Create A New User
    and the web crawler heard nothing...

    How do I use this? | Other CB clients
    Other Users?
    Others meditating upon the Monastery: (5)
    As of 2019-06-27 08:10 GMT
    Find Nodes?
      Voting Booth?
      Is there a future for codeless software?

      Results (111 votes). Check out past polls.