Beefy Boxes and Bandwidth Generously Provided by pair Networks
Problems? Is your data what you think it is?
 
PerlMonks  

Speeding up Postgres INSERTs

by punkish (Priest)
on Jun 18, 2010 at 05:17 UTC ( #845307=perlquestion: print w/ replies, xml ) Need Help??
punkish has asked for the wisdom of the Perl Monks concerning the following question:

I have an array of arrays that I am INSERTing into a Pg table (pseudo-code ahead) —

my $sth = $dbh->prepare("INSERT INTO table (a, b, c) VALUES (?, ?, ?)" +); for my $row (@table) { $sth->execute($row->[0], $row->[1], $row->[2]); } $dbh->commit;

I'd like to speed up the INSERTs by possibly using the COPY table FROM STDIN WITH DELIMITER AS ',' command. Is there a way I can do that? Is there some other way I can speed up the INSERTs?

Update: I should have added, right now the INSERTs are rather slow; several seconds for every 100K INSERTs.

Update2: I finally got a result back from one set of INSERTs. I am getting ~3300 insertions per second. I would consider that fairly slow.

--

when small people start casting long shadows, it is time to go to bed

Comment on Speeding up Postgres INSERTs
Select or Download Code
Re: Speeding up Postgres INSERTs
by Khen1950fx (Canon) on Jun 18, 2010 at 06:13 UTC
    As I understand it, COPY can read csv files, and the comma delimiter is default when you use CSV mode. Have you tried that?
Re: Speeding up Postgres INSERTs
by Krambambuli (Deacon) on Jun 18, 2010 at 06:49 UTC
    Do you have any indexes defined for these columns? Have you maybe tried/thought about disabling indexes during the INSERTs?
Re: Speeding up Postgres INSERTs
by jau (Hermit) on Jun 18, 2010 at 07:25 UTC

    In case you missed it: The PostgreSQL documentation has a section with performance tips for populating a database.

Re: Speeding up Postgres INSERTs
by JavaFan (Canon) on Jun 18, 2010 at 09:02 UTC
    For most databases (I don't know Pg, so it may be different), it's much faster to have a few large
    INSERT INTO table (a, b, c) VALUES (z, y, x), (w, v, u), (t, s, r), ...
    queries, than to do a round trip for each row inserted.

    And sometimes, it's faster to first drop all the indices, and recreate them after the inserts are done. But that requires some benchmarking for each case.

    several seconds for every 100K INSERTs
    That doesn't sound that slow to me. Perhaps you need to hire a Pg specialist to tweak your database.
      I finally got a result back from one set of INSERTs. I am getting ~3300 insertions per second. I would consider that fairly slow.
      --

      when small people start casting long shadows, it is time to go to bed
        3300 insertions/second comes to 30 seconds for 100K. I find 30 fairly large for "several". But is this number for 3300 separate queries, or a single query with 3300 rows? Of course, there are many other factors to consider, not of them mentioned in the original post, nor in any way perl related:
        1. How many indices are they? What kind of indices? Do they allow for new rows written at the end of the table, or must the table be rewritten?
        2. Are there any constraints? Foreign keys? Unique indices? Triggers? Auditting?
        3. How many rows are there already?
        4. How many other processes are using the table while you're doing inserts?
        5. What's the hardware the database is running on?
        6. How much memory does said hardware have?
        7. How "hot" is the table?
        8. Is the table being replicated?
        9. What's the physical layout of where the data is written to? What's the physical layout of where your logs are written to?
Re: Speeding up Postgres INSERTs
by happy.barney (Pilgrim) on Jun 18, 2010 at 10:51 UTC
    Do you have AutoCommit turned Off ?
    Snippet that use COPY:
    my $table = 'table'; my @cols = (qw( a b c)); $" = ", "; $dbh->do ("COPY $table (@cols) FROM STDIN"); while (...) { $dbh->pg_putline (join ("\t", @values) . "\n"); } $dbh->pg_endcopy; $dbh->commit;
      my $table = 'table'; my @cols = (qw( a b c)); $" = ", "; $dbh->do ("COPY $table (@cols) FROM STDIN"); while (...) { $dbh->pg_putline (join ("\t", @values) . "\n"); } $dbh->pg_endcopy; $dbh->commit;

      Ahhh... that was what I was looking for. Many thanks.

      In answer to other queries --

      • AutoCommit is off (hence, the explicit COMMIT every 100_000 rows).
      • I am not reading nor wanting to read from a CSV file, but directly from an array in memory.
      • I don't have any INDEXes other than a primary key
      --

      when small people start casting long shadows, it is time to go to bed
      Update: Using the COPY command, I am now getting around 15,000 INSERTs per second. Sooooo much better.
      --

      when small people start casting long shadows, it is time to go to bed

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://845307]
Approved by moritz
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others contemplating the Monastery: (7)
As of 2014-08-28 09:57 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    The best computer themed movie is:











    Results (259 votes), past polls