Re^2: Recent slowness and outage (IPC cache)

My first guess is that no, it wouldn't help. It would greatly reduce memory usage on the system. If that then lead to making the cache size a lot bigger, then that would help. It would have other benefits as well.

But it would also thwart my plan for greatly reducing the impact of the race condition. Right now, you get a node via the cache, then you make updates to it, then you request that the updates be saved to the database (via the cache). Between "get node" and "save node", there is a pretty high likelyhood that someone else also started making changes to the node. And we are stuck with optimistic locking and ignoring failures (because of the nature of the site) so the best we can do is have each process only update fields that it changed. So we have to track which fields this process changed. And there are two ways to do that: 1) note each change, ie. tie, ie. slow things down too much; 2) save a copy of the original and compare (my plan).

So how do we save a copy of the original? In a perfect world, there would be a "I plan to change this node" version of the "get node" function and you wouldn't be allowed to save changes to nodes that you didn't get via that function. Getting from the PM world to that world would require huge numbers of changes all over the place. So I'm not shooting for that.

So I need to have the "get node" function save a copy of the node separate from the one that gets handed to everyone and that might get changed. Well, the node cache is the perfect place for this. However, if the node cache is shared between processes, then the copy saved in the node cache is no longer going be "what this process originally got" so it isn't useful for making that comparison to reduce the race condition.

So I don't want an inter-process node cache. Not until we have some other way to track changes to fields in nodes. That is a long way off.

What I'd love is an inter-process nodelet and miscellaneous data (such as chatter in a few forms, approval status, etc.) cache. Nodelets are currently cached and only updated if they are more than N seconds old (where N is configured per nodelet). But each process has a seperate cache so most nodelets are updated P times every N seconds (where P is the number of httpd/mod_perl processes in use). So a shared cache reduces the load for many of the nodelets by about a factor of P (dividing by even 16 is a very nice thing to do to system load).

Negotiating updates in such a cache gets rather complicated and varies by what you want to cache, and this all gets more complicated because you want to see updates from other computers also.

- tye

Comment on Re^2: Recent slowness and outage (IPC cache)

Replies are listed 'Best First'.
Re: Re^2: Recent slowness and outage (IPC cache) by perrin (Chancellor) on Jan 17, 2003 at 05:01 UTC
Do multiple updates to the same node really happen that often? I thought people were not usually able to edit the same node. Maybe I'm misunderstanding, but I don't think the node cache is the right place to implement your idea about saving the original version. Don't you want to save the version that this user originally received, rather than this process? There's no guarantee that a user will even be on the same machine when their form submission is processed. Seems like you would have to do it based on session and keep it in the database. Having a separate cache for nodelets (or anything else) is no problem. You can have as many caches as you want. Finding a solution that works across all the machines in the cluster is harder but possible. It would require something fancier like messaging daemons or some database trickery. Just making the cache multi-process is enough of a challenge for now.	[reply]
Re^4: Recent slowness and outage (IPC cache) by tye (Sage) on Jan 17, 2003 at 05:31 UTC
You vote for my node and you change my XP, your XP, and your vote count. I vote for your node and I change your XP, my XP, and my vote count. Yeah, user nodes get updated simultaneously all the time. Other nodes as well. I'm not talking about users editing the text of the same node at the same time. The few situations that support that implement more complex locking. The problem is not that the user made decisions about how to update a node based on old data they saw. Please read the Last checked flag not updating? thread which covers this problem in more detail. Yes, you are misunderstanding. (: The changes to the node happen during the course of rendering a page. What I need to keep track of is the state of the node between the "get node" and the "save node". Ugh. I just realized that we have another race to deal with. You could have this happen: process X does "get node P" X changes a field in node P X calls a routine... process Y finishes an update and does "save node P" Y's node cache saves the changes and increments the version number of node P the routine X called calls "get node P" X's node cache notes that P's version number is old and rereads the node into the cache, clobbering the previous changes X had made X finishes updating and does "save node P" and X's changes are lost. So "get node P" also needs to be made smart enough to not reget the node. My first thought was to compare the two versions of the node and not reget if the node is changed. But I think a better idea is to only reget a node once per page load (more efficient, more robust). Well, I'll chew on that some more... - tye	[reply]
Re: Re^4: Recent slowness and outage (IPC cache) by perrin (Chancellor) on Jan 17, 2003 at 16:16 UTC
Double ugh. We're getting tangled up by limitations of MySQL when this was designed (lack of transactions and row-level locking) and limitations of the Everything code (writing the entire node at once). I don't know if it's possible to truly fix this stuff without fundamental changes. Maybe I should just focus on making a shared nodelet cache. Anyway, as I understand it you want to get the node, keep a local copy, do the updates (which currently all modify the version of the node in the cache), and then compare that to the original. Unlike the current in-memory node cache, a shared one based on Cache::Mmap would not update the node in the cache until it is explicitly saved back to the cache, i.e. in-memory updates do not modify the cache. I'm not sure if that helps any or not. Depends on what the update code does.	[reply]
Re^6: Recent slowness and outage (IPC cache) by tye (Sage) on Jan 17, 2003 at 17:13 UTC
Re: Re^6: Recent slowness and outage (IPC cache) by perrin (Chancellor) on Jan 17, 2003 at 17:32 UTC
Some notes below your chosen depth have not been shown here


Don't ask to ask, just ask
	PerlMonks