http://www.perlmonks.org?node_id=979140


in reply to A distributed design challenge

The budget of a campaign is stored centrally on Redis, right? Otherwise winning a bid and hitting the budget limit on one server would still not hinder other servers bidding for more

So why isn't the bidding amount subtracted from the budget when a server bids instead of when it wins. Naturally this means it has to be added again when the bid is lost, so there is slightly more overhead.

No overspending is possible in this case (substracting the budget must be an atomic operation naturally)

But I might be totally off. I don't understand your calculation that if 36 servers bid $99 with a total budget of $100 and all win, that they are $35 over limit and not 36*$99 -$100.

Replies are listed 'Best First'.
Re^2: A distributed design challenge (scale)
by tye (Sage) on Jun 29, 2012 at 20:25 UTC

    I agree (except for the minor confusion with some numbers in the last paragraph).

    Bids probably need to be subtracted from a budget in Redis. You might have a separate "budget" that includes bids while keeping the budget that only includes payments for winning bids.

    You might also hedge and say that each bid only has a (say) 60% chance of winning so bidding $50 subtracts only $30 from the "budget less outstanding bids", realizing that this risks going over the budget if you win more than an average of 60% of bids (weighted by bid size).

    There are several complications of this approach:

    1. Each bid must talk to the single Redis master
      • This prevents horizontal scaling of Redis capacity
      • This causes an outage in bidding when the master fails (and it may take a while to promote one of the slaves to be the new master)
    2. You probably need to deal with a bidding host dieing and then failing to remove a bid in a timely manner, leaving the available budget unnecessarily low for too long

    Also, the example Ovid gives seems way too optimistic and thus to greatly understate the risk. If the budget is $100 and $50 of it has been spent and there are no outstanding bids, then 36 systems could each bid $50 and end up spending $1800 and exceeding the budget by 1750%, not a measly 35%. So that makes me suspect that the explanation was incomplete.

    But winning bids already have to access the Redis master, so unless winning bids are a small fraction of regular bids, there isn't much benefit to horizontal scaling with Redis slaves. But I could certainly see the timing being less critical for recording winning bids (and thus an outage on the master would prevent bidding but not be as serious of a problem for the recording of winning bids).

    I can see a lot of approaches for dealing with better scaling and higher availability while still subtracting each bid from a central balance. But I don't think a simple Redis set-up would be the solution. I'd probably want multiple masters so you might want to look at membase or riak, for example (though that might not really address scaling).

    You could also shard your requests so that bids against budget $X only go to (say) 3 of the available servers and then each server can bid upto 1/3 of the available budget (using a server-local central bid amount tracker, perhaps a local Redis instance). Then your 100s of budgets spread load across your dozens of servers but each budget only spreads over a smaller subset. That obviously can get complicated to maintain that mapping in a way where load is sufficiently balanced.

    You can also just roll your own somewhat trivial multi-master infrastructure where each master is responsible for broadcasting updates to other masters. You can even use "broadcast" or "multicast" packets for notifying all masters "at once", if you seriously trust the capacity and reliability of your network. This also brings to mind pubsub.

    - tye        

      I had considered subtracting bid amounts from the budget, but rejected it for several reasons, all of which are problematic. First, if 36 boxes bid simultaneously, we could overshoot our budget unless we have boxes "lock" the bid amount, but that changes us from parallel to serial bidding and increases response time to the point where bids that should have been one would be lost.

      Serial bidding might be appropriate as we get close to the budget, but that raises another issue: the box that receives the notification that we lost is usually not the box that made the corresponding bid. Because we don't receive the bid amount in the loss notification, we'd have to search all 36 boxes to find out who bid what (or tremendously complicate our Redis setup by trying to store huge amounts of bid information in Redis and we're expecting our volume of data to increase by a couple of orders of magnitude).

      As for your suggestion of me being overly optimistic, that might be the case, but the bid amounts are (fortunately) small fractions of the budget, so it won't be catastrophic until we scale up.

      Riak is not an option because the response times on Riak are slow enough that we'll lose bids we should have one (though we're probably going to put Redis in front of Riak). I don't know about membase.

      We're probably going to go with a strategy of allocating separate budgets to separate boxes. It's more complicated, but it seems much safer. We lose a lot of bidding capacity for individual campaigns, but for many campaigns, I think it's doable.

        First, if 36 boxes bid simultaneously, we could overshoot our budget unless we have boxes "lock" the bid amount, but that changes us from parallel to serial bidding and increases response time to the point where bids that should have been one would be lost.

        I don't see where you need locking. Naturally the check "if ($budget>0-$overshootallowance) { $budget-=$bid }" must be an atomic operation and this is a minimal lock, but this is not a lock over the duration of a bid. The case that only one machine can bid can only occur if the total budget is already so low as to allow exactly one last bid

        UPDATE: To clarify, if the subtract operation is the most time-consuming part in a single bid operation, then having a central budget is practically serializing the bidding per campaign. I'm not sure whether you meant that or a lock over a complete bid-and-wait-for-success-transaction

        You either have the ability/funds to allow *all* your boxes to overshoot or you don't. In the first case you don't have a problem. In the second case, whatever scheme you are using, you obviously have to lose some bids as the budget gets low.

        the box that receives the notification that we lost is usually not the box that made the corresponding bid

        Ok, this is a real problem for the central budget idea. But since you want to go with the separate budget idea, note that you have a smaller but similar problem: Since the successful bid arrives at a different machine, that machine has to deduce the amount from its own budget (if you don't want to search all 36 boxes). But what happens if that machine has no budget left? Either it overshoots or it searches the other boxes for budget left over (which takes enough time for the machine with the budget left over to bid although it should not). In the worst case all successful bid notifications go to the same machine with budget==0 and all the other machines can still place further bids until they are notified by that one machine.

        You have to be careful that the negotiation traffic between the machines doesn't throttle your network when lots of machines have no budget left and are repeatedly contacting other machines to get another bid out.

Re^2: A distributed design challenge
by Anonymous Monk on Jun 29, 2012 at 16:05 UTC
    I don't understand your calculation that if 36 servers bid $99 with a total budget of $100 and all win, that they are $35 over limit and not 36*$99 -$100.
    I think you misparsed the original statement. He was saying that there is $1 total left in the budget at that time and all 36 servers are trying to spend $1 at the same time.

      Ok, this makes sense. They would bid only $1 because their budget is limited to $1, instead of making the bid they calculated as sensible.

Re^2: A distributed design challenge
by Ovid (Cardinal) on Jul 02, 2012 at 18:34 UTC

    We considered subtracting the bidding amount from the budget when the server bids, but there are some technical reasons which make it more difficult to restore that amount if the server loses. Overspending is still possible when you have 36 boxes bidding at the exact same moment (unless one of them takes a lock on Redis, but that changes our parallel bidding to serial and kills our response times, meaning we lose bids we would want to win).