|Perl: the Markov chain saw|
(somewhat OT) How to talk to "though" boss?by techcode (Hermit)
|on Nov 10, 2009 at 23:21 UTC||Need Help??|
techcode has asked for the
wisdom of the Perl Monks concerning the following question:
There is a ton of articles and discussion on this subject - I realize that ("30 new ways to make your boss listen to you" articles LOL). But since most of the smart people are fellow mongers - I'm asking for your help here. I also figure that with Perl being less picked solution with manager types - there is probably more people with hands on experience on how to make your boss think it was his idea :) And I'm asking for the technical guidance as well ... if how I think that system should be optimized is correct ...
The boss is long time in IT, made a small fortune, but his stubbornness is (beside making us the employees jobs tougher than they should be) costing him money (well he is earning less than he and we could). We end up doing all the things twice - first we do it his way (even though we say there is some better way), and then do it again using our initially proposed way ... But by then, there is a lot of other things breaking down, and you don't have enough time to work on it for real - ans just end up making it work (tm)
It culminated recently when whole system broke down under load - the night before (well that morning) I moonshined (freelanced beside full time job) for an well known client of mine until ~5 AM when I went to bed - updating some old projects. At 8 AM I was woken up by one of the 24/7 support staff "Nothing is working, I can't open the console" (console = web admin interface written in PHP that's overloading the system more than the system is overloading itself :)
Sure enough, load on DB server was in twenties. CPU is like a total of 400% (quad core Xeon and all that GB's of RAM didn't help). And then I realized that for that first 8 hours of the day - we had more new data than total on a better of days. It took me and main system admin (my "supervisor") a few hours of manually stopping everything, vacuuming the DB (PostgreSQL - 8.2 - at least the whole thing lead to supervisor finally updating it to 8.4 :) ) to fix the mess. We hit the new record - load of 42 in the process.
Anyway - company is in SMS business, small bits of data (140 Bytes = 160 chars due to 7bit encoding in GSM), store and forward. Core system is in Perl and is using Net::SMPP for both servers and clients (towards providers/operators). Database is PostgreSQL, and we are using it as queue (a lot of them) as well. Each message has about 5 states, and there is a queue (table constantly being written to, read, and deleted) for each of those. But maybe even worse is that for each of those states, we are updating one extra large table (8 indexes on it, almost 30 fields, many are defined as TEXT type (so no fixed length reserved). So for each message we have 5x row versions and hence need to vacuum and reindex quite often. And we are talking about only having 500K new records (rows) per day, which should laughable amount with say 1 KB of data worth for each.
Boss realized that we should start using cache of some sort (though he still thinks that you can easily say: select * from cache; ) so I'm starting to use Memcache to help out the poor database.
I need a way to convince him that we need to redesign the DB (and my supervisor who is great Linux admin, who is in love with Erlang - yet continually doesn't get it that event based still boils down to the "receiver" process asking at some point - "is there something new for me" - even if thats "is there a signal/semaphore for me that marks there is new data for me").
Move the queues into RAM (perhaps implement queues using Memcache), and move the 3 (INT type) of fields that are the only fields being updated in the before mentioned large table - into separate table. And hence not make a copy of the whole row (with 3 or 4 TEXT fields) for each update, just the three int fiels (2 being updated + id to reference the "all/master details" table).And start using Memcache for all the reads since basically everything for the operation of the system is searched by INT id (you read ID from the queue, and then read data associated with that ID, do some work, update the big table and place ID in next queue). And we have too many stupid reads that should and could be avoided even if that means having duplicated fields here and there. The other thing that I need to convince the boss (supervisor got that - guess he liked that he could simply use svn export instead of manually copying files to production servers) is that we need to start using Subversion or similar. There are 5 of us working on code spread among 5 servers, and we are stepping on each other toes by overwriting someones changes, or trying to revert to previous version (do you have a backup of that code - um at best we have previous version~, not the 4th version in the past).
O and console written in PHP - well it loads all (most) data in memory, munges it there (instead of in DB/SQL) and every time someone hits refresh while working with it - we see a couple of seconds to minutes (or an hour) of 100% CPU usage on DB and Web servers. It does that all the time (instead of say calculating it once every 5 minutes and serving it from that cache).
It would be great if while I'm at all of it - I get a raise :D I totally deserve it since I haven't got one since I started working there 9 - 10 months ago - and I proved myself by implementing a new service from scratch that totally rocks and it even cleans up itself (I've set up a cron to vacuum and reindex it) - though would benefit by using queue in RAM instead of DB ;) And I even freelanced for the company before that - so they knew what they are getting.I'm thinking about dedicating a weekend on writing, testing and benchmarking code and DB optimizations, and fitting it all in a nice presentation to show (projector and all) - so I can't get one of these from the boss or supervisor:
What do you think about approach I'm planning - perhaps you have some better strategies? And what about the technical aspects I thought off?
Sorry for the long post and text intensive post ...