Beefy Boxes and Bandwidth Generously Provided by pair Networks
laziness, impatience, and hubris
 
PerlMonks  

Re^2: Working with large number of small constantly updated records

by techcode (Hermit)
on Apr 28, 2009 at 09:17 UTC ( #760551=note: print w/ replies, xml ) Need Help??


in reply to Re: Working with large number of small constantly updated records
in thread Working with large number of small constantly updated records

Version that I'm working on is 8.3.5 - no autocacuum, though I run it manually basically before (batch of) tests and now playing with vacuuming them from within Perl code. My test data is small - I test with 10.000 to 100.000 records so sizes are small.

Real system has ~5 new records per second + all the queue tables for each of those (and it's the queue tables that are being updated/deleted from) and the idea is that they are "small" or preferably as close to 0 records as possible. It's relatively recent 8.x version running on a dedicated server double Xeon something with 2 (or perhaps 4) GB of RAM. At most data that is kept there is for last 3 days - those queue tables are vacuumed every couple of hours (manually) since each of them gets ~ 500K records added and deleted each day. Main data table being partitioned per day (so ~ 500K records that are added and from time to time updated - per partition - 3 partitions) each partition deleted after 3 days - so no vacuum is running on them.

I know that full vacuum is locking the whole table - the fact that there are only two processes working with any given queue table (previous state process, and the one processing that queue/state) make it acceptable.

IMHO the problem is that DB's aren't primary meant to be queues where in one day you get 5 million records added/deleted (estimate for 10 queue tables). But at the same time you need transactions to be sure where each and every record is in the processing. OTOH all servers are having up-time of up to 3 years - so it might be OK to do everything in RAM and occasionally just update the DB. Another alternative I'm looking at is the DBM::Deep since it sales pitch is quite impressive:

A unique flat-file database module, written in pure perl. True multi-level hash/array support (unlike MLDBM, which is faked), hybrid OO / tie() interface, cross-platform FTPable files, ACID transactions, and is quite fast. Can handle millions of keys and unlimited levels without significant slow-down.
In this system everything is called by it's Key/ID (anyone mentioned hashes?), or in FIFO sequence (arrays?). With only some reports needing real SQL. Of course using DBM::Deep would give the PHP guys that are working on administration interface (web) quite a headache :) So I'm guessing I'll try to just shift those queues to DBD::Deep and see what happens.

Have you tried freelancing/outsourcing? Check out Scriptlance - I work there since 2003. For more info about Scriptlance and freelancing in general check out my home node.


Comment on Re^2: Working with large number of small constantly updated records
Re^3: Working with large number of small constantly updated records
by erix (Vicar) on Apr 28, 2009 at 23:48 UTC

    If autovacuum is off on an 8.3 system, someone disabled it. Why was that done? It may be causing problems; especially as you say that running vacuum 'manually' fixes the problem for a particular process. What happens if you put it back on?

    Also, isn't there an index missing from some query, or something similarly simple? Has ANALYZE been run regularly?

    (Of course, this only applies when you are actually sure the db is the bottleneck.. )

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://760551]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others rifling through the Monastery: (10)
As of 2014-09-01 14:30 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    My favorite cookbook is:










    Results (13 votes), past polls