I think you hit the nail on the head with this point: (emphasis mine)
you have a class that mostly just deals with the bits that need to be under a specific mutex. So the code to be run under the mutex is kept very small and cohesive by being its own class that just concentrates on doing the locking right.
The very simplest atomic mechanisms are the most preferable to me; ones that make no attempt whatever to do the right thing for me. If I need to protect a block such that more than one body of code can be in it, I know how to build that. If I need to allow a single actor on the stage to grab more than one claim to it at a time, I know how to build that, too. But I am also going to build other forms of rules and error-detection into that same mechanism such that, if the software ever does something that I did not intend for it to do and certainly did not think that it was capable of doing, the software itself will tell me that it has failed. In my designs, I want to build those things ... and I can count on the fingers of one hand the number of times in more than thirty years that I have ever had the need to do so.