Beefy Boxes and Bandwidth Generously Provided by pair Networks
XP is just a number
 
PerlMonks  

Re^2: Posgres batch read with DBI? - ( PostgreSQL )

by jszinger (Scribe)
on Jan 27, 2021 at 22:43 UTC ( #11127547=note: print w/replies, xml ) Need Help??


in reply to Re: Posgres batch read with DBI? - ( PostgreSQL )
in thread Posgres batch read with DBI?

With my DBA hat on, I encourage running the program as ONE transaction with auto-commit off. That way, if there is an error after 249k rows, Postgres will rollback the entire transaction and your data will be unchanged and unharmed. Also other clients will not see the in-process transaction and will get a consistent view of the data.

Break up the transaction only if testing and benchmarks demonstrate a real problem. “Premature optimization is the root of all evil.”

  • Comment on Re^2: Posgres batch read with DBI? - ( PostgreSQL )

Replies are listed 'Best First'.
Re^3: Posgres batch read with DBI? - ( PostgreSQL )
by Marshall (Canon) on Jan 30, 2021 at 05:37 UTC
    I agree with you.

    The OP wrote: "Greetings. I have about 250K records in a postgres db I need to process. I'm wondering if there's a way to read this in batches of 1000 records or something, to cut down on db transaction overhead.
    The way to "cut down on transaction overhead" is simply to run fewer write transactions - that doesn't have much to do with how often you read the data.

    The DB can perform many, many read operations per second.
    It will be able to perform much fewer write transactions per second than read operations.

    The DB whichever one you have, will try to maintain a self consistent state on the hard drive. See Wiki - ACID properties.

    There is a lot of overhead involved in a "transaction". The DB will have to write to the hard disk multiple times. A commit of a million row update doesn't necessarily take all that much longer than a commit of 100 rows. The commit operation, in and of itself is "very expensive".

    The Perl DBI provides:

    $rc = $dbh->begin_work or die $dbh->errstr; and $rc = $dbh->commit or die $dbh->errstr;
    Wrapping those 2 statements around any DB operations involving many "writes" will have a huge effect upon performance.
    There is no problem whatsoever with a 1 million line "commit".
    250 commits for 250K records will slow things down considerably - perhaps even an order of magnitude!

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://11127547]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others browsing the Monastery: (3)
As of 2021-04-17 02:55 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found

    Notices?