Beefy Boxes and Bandwidth Generously Provided by pair Networks
Think about Loose Coupling
 
PerlMonks  

Re: Working with large number of small constantly updated records

by erix (Prior)
on Apr 27, 2009 at 21:51 UTC ( [id://760445]=note: print w/replies, xml ) Need Help??


in reply to Working with large number of small constantly updated records

I'm wondering, what version of postgres? What is the vacuum regime like? Is autovacuum running? Is it keeping up with updates? How large the table(s) /partitions?

What is the hardware like? how much memory is there? How much is available to postgres?

If it turns out that the lack of performance is indeed due to the database, you may be better off posting on pgsql-performance@postgresql.org (make sure you include pertinent detail, mainly postgresql.conf settings, basic hardware data, load characteristics etc).

Btw, vacuuming is necessary, but be careful with VACUUM FULL: it's slow and needs a table lock (normal VACUUM doesn't).

Replies are listed 'Best First'.
Re^2: Working with large number of small constantly updated records
by techcode (Hermit) on Apr 28, 2009 at 09:17 UTC

    Version that I'm working on is 8.3.5 - no autocacuum, though I run it manually basically before (batch of) tests and now playing with vacuuming them from within Perl code. My test data is small - I test with 10.000 to 100.000 records so sizes are small.

    Real system has ~5 new records per second + all the queue tables for each of those (and it's the queue tables that are being updated/deleted from) and the idea is that they are "small" or preferably as close to 0 records as possible. It's relatively recent 8.x version running on a dedicated server double Xeon something with 2 (or perhaps 4) GB of RAM. At most data that is kept there is for last 3 days - those queue tables are vacuumed every couple of hours (manually) since each of them gets ~ 500K records added and deleted each day. Main data table being partitioned per day (so ~ 500K records that are added and from time to time updated - per partition - 3 partitions) each partition deleted after 3 days - so no vacuum is running on them.

    I know that full vacuum is locking the whole table - the fact that there are only two processes working with any given queue table (previous state process, and the one processing that queue/state) make it acceptable.

    IMHO the problem is that DB's aren't primary meant to be queues where in one day you get 5 million records added/deleted (estimate for 10 queue tables). But at the same time you need transactions to be sure where each and every record is in the processing. OTOH all servers are having up-time of up to 3 years - so it might be OK to do everything in RAM and occasionally just update the DB. Another alternative I'm looking at is the DBM::Deep since it sales pitch is quite impressive:

    A unique flat-file database module, written in pure perl. True multi-level hash/array support (unlike MLDBM, which is faked), hybrid OO / tie() interface, cross-platform FTPable files, ACID transactions, and is quite fast. Can handle millions of keys and unlimited levels without significant slow-down.
    In this system everything is called by it's Key/ID (anyone mentioned hashes?), or in FIFO sequence (arrays?). With only some reports needing real SQL. Of course using DBM::Deep would give the PHP guys that are working on administration interface (web) quite a headache :) So I'm guessing I'll try to just shift those queues to DBD::Deep and see what happens.

    Have you tried freelancing/outsourcing? Check out Scriptlance - I work there since 2003. For more info about Scriptlance and freelancing in general check out my home node.

      If autovacuum is off on an 8.3 system, someone disabled it. Why was that done? It may be causing problems; especially as you say that running vacuum 'manually' fixes the problem for a particular process. What happens if you put it back on?

      Also, isn't there an index missing from some query, or something similarly simple? Has ANALYZE been run regularly?

      (Of course, this only applies when you are actually sure the db is the bottleneck.. )

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://760445]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others goofing around in the Monastery: (6)
As of 2024-03-19 09:21 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found