Beefy Boxes and Bandwidth Generously Provided by pair Networks
Problems? Is your data what you think it is?
 
PerlMonks  

Re^2: Working with large number of small constantly updated records

by techcode (Hermit)
on Apr 28, 2009 at 08:37 UTC ( #760542=note: print w/ replies, xml ) Need Help??


in reply to Re: Working with large number of small constantly updated records
in thread Working with large number of small constantly updated records

I still haven't done any real profiling - other than running the Perl code without database, and then with - and timing both.

One part of the system observed works as: get record(s) from DB - find longest possible match in Trie (HoHoH... ref). Though I also played around with ordinary hashrefs and adding a char by char in a loop and trying to find it in one large (one "dimension") hashref. Trie's performance degrades with depth, and this is faster for deeper searches - but they meet around length where most of records that we are matching against are. After it finds it, it updates data back to DB. Old version of that part of system is implemented as Pg stored procedure - that loops by adding char by char and for each loop, trying to find it with SELECT * FROM table WHERE whatever = ?. Similar as Perl code I tested beside Trie - but runs through whole table (ok it's indexed and all but still ...) for each loop, and we are either way trying to free up DB from all but necessary work.

And without DB (just a for(1..1000000){ looking it up }) we are talking about a range of 50K (or was it 500K) matches per second on ordinary desktop hardware (AMD Dual Core, 1 GB Ram - single channel DDR2, sata2 HD). With DB involved it's around 500 per second, and I tried 500, 1000 and 2000 records in one transaction: fetching that many with ... LIMIT X, starting transaction, then doing work in loop with updates/deletes, and committing at the end. I'm considering trying to do less - maybe 200, but actually building one large SQL string ("UPDATE ... rec1; DELETE ... rec1_queue; UPDATE ...rec2; DELETE rec2_queue; ...") and running it under one execute/transaction.

Both PostgreSQL and Perl code running on same machine for those tests. Real DB is of course on separate DB server.


Have you tried freelancing/outsourcing? Check out Scriptlance - I work there since 2003. For more info about Scriptlance and freelancing in general check out my home node.


Comment on Re^2: Working with large number of small constantly updated records

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://760542]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others examining the Monastery: (9)
As of 2014-07-25 07:22 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    My favorite superfluous repetitious redundant duplicative phrase is:









    Results (169 votes), past polls