http://www.perlmonks.org?node_id=951551


in reply to Re^2: Recursive locks:killer application. Do they have one? (mu)
in thread Recursive locks:killer application. Do they have one?

What I dislike is the overhead of current implementation with its need to count and no timeout.

I find it hard to imagine how "need to count" can have more than the most trivial of impacts on the efficiency of a mutex.

Ah, Java switching locking schemes explains why this is such a political football, then.

I agree with many of your points above.

IME, the whole idea of needing to retain a lock long enough to call out to another method just seems wrong to me.

True. But the methods I'm talking about were tiny bits of utility code. Think something like "length" or "isReserved".

The only reason the locking would get that complicated for these things was due to being careful to only lock when locking was required. So, a huge fraction of invocations of some method would never even need to lock. When moving the "window" where the lock was held to the smallest possible scopes, those scopes would fairly often move down inside some internal utility method. This was C++ so there was more call for tiny utility methods compared to writing in Perl.

Something like a "move" operation wouldn't have to lock unless either the source or destination was "shared". And a "clear" operation would boil down to a bunch of "move" operations with no outer lock while a "shutdown" operation would lock and then "clear".

But these days I don't program by writing a class and then trying to insert the locks where required so that the class becomes "thread safe". I design the system to not need locks except the minimal number of key places. It is closer to "multiple processes" coding over "multiple threads" coding.

So, instead of some object that might be shared between threads, I'd have a mechanism for transferring responsibilities between threads that would transfer simple data and end up with either two similar, separate objects or one object being destroyed and another created.

So, when I try to put my "multiple threads" programming hat back on, I would want re-entrant mutexes (and requiring an explicit unlock sounds like a really horrid idea). But, stepping back, I'd rather just not go back to that way of thinking and instead do design that can be implemented with "multiple processes" even if the expected implementation is "multiple threads", and that makes "re-entrant or not" mostly a moot question.

- tye        

  • Comment on Re^3: Recursive locks:killer application. Do they have one? (mu)

Replies are listed 'Best First'.
Re^4: Recursive locks:killer application. Do they have one? (mu)
by BrowserUk (Patriarch) on Feb 02, 2012 at 23:55 UTC
    I find it hard to imagine how "need to count" can have more than the most trivial of impacts on the efficiency of a mutex.

    See for yourself. It's not just time but also space efficiency.

    Here is perl's current implementation of recursive locking

    typedef struct { perl_mutex mutex; PerlInterpreter *owner; I32 locks; perl_cond cond; } recursive_lock_t; void recursive_lock_acquire(pTHX_ recursive_lock_t *lock, char *file, int l +ine) { assert(aTHX); MUTEX_LOCK(&lock->mutex); if (lock->owner == aTHX) { lock->locks++; } else { while (lock->owner) { COND_WAIT(&lock->cond,&lock->mutex); } lock->locks = 1; lock->owner = aTHX; } MUTEX_UNLOCK(&lock->mutex); SAVEDESTRUCTOR_X(recursive_lock_release,lock); }

    And that lot -- a mutex and owner, a locks count and a condition variable is built on top of this lot:

    115: typedef union 116: { 117: struct 118: { 119: int __lock; 120: unsigned int __futex; 121: __extension__ unsigned long long int __total_seq; 122: __extension__ unsigned long long int __wakeup_seq; 123: __extension__ unsigned long long int __woken_seq; 124: void *__mutex; 125: unsigned int __nwaiters; 126: unsigned int __broadcast_seq; 127: } __data; 128: char __size[__SIZEOF_PTHREAD_COND_T]; 129: __extension__ long long int __align; 130: } pthread_cond_t;

    And this:

    76: typedef union 77: { 78: struct __pthread_mutex_s 79: { 80: int __lock; 81: unsigned int __count; 82: int __owner; 83: #if __WORDSIZE == 64 84: unsigned int __nusers; 85: #endif 86: /* KIND must stay at this position in the structure to maintai +n 87: binary compatibility. */ 88: int __kind; 89: #if __WORDSIZE == 64 90: int __spins; 91: __pthread_list_t __list; 92: # define __PTHREAD_MUTEX_HAVE_PREV 1 93: #else 94: unsigned int __nusers; 95: __extension__ union 96: { 97: int __spins; 98: __pthread_slist_t __list; 99: }; 100: #endif 101: } __data; 102: char __size[__SIZEOF_PTHREAD_MUTEX_T]; 103: long int __align; 104: } pthread_mutex_t; 105:

    Which, when you realise that a non-recursive lock can be built atop a single bit, starts to look just a little indulgent.


    With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.

    The start of some sanity?

      Yes, counting requires a tiny bit of space for the counter. Obviously. Even if that doubled the space overhead of the mutex, it would be quite the rare situation when that would matter to me.

      The overhead of the counting indeed looks pretty trivial to me. We are only talking about the following:

      ... I32 locks; ... lock->locks++; ... lock->locks = 1; ...

      The other "overhead" you show is so locking can block, so Perl can have a single interface over multiple implementations of blocking locking on multiple operating systems, so Linux can implement a portable blocking locking interface over the current choice of kernel implementation, so Linux can detect deadlock loops, so Linux users can select different types of "wake order" behavior, so somebody can adjust "spin" to reduce context switches in certain scenarios, etc.

      Which, when you realise that a non-recursive lock can be built atop a single bit, starts to look just a little indulgent.

      I don't see how one can implement blocking locking using a single bit. I certainly see some overhead that could be eliminated for the particular environment you are looking at. I suspect such would require more 'ifdef' work and thus create more complexity at other layers (assuming that dropping support for other platforms is not allowed) for the sake of conditionally reducing some run-time complexity in some environments. It is possible that such might even have a noticeable benefit in such environments.

      Getting rid of the counting there isn't going to make much difference. But surely you are instead implementing a new alternative, so not bothering to implement counting seems worth considering. I don't even see any advantage to exposing an API that would interfere with deciding to add support for counting at some later date (which means not bothering to implement it up front is probably wise).

      - tye        

        I don't see how one can implement blocking locking using a single bit.

        Here you go:

        #include <windows.h> #include <stdio.h> #include <time.h> #include <process.h> typedef struct { void *protected; int loops; } args; void lock( void *protected ) { while( _interlockedbittestandset64( (__int64*)protected, 0 ) ) { Sleep( 1 ); } } void unlock( void *protected ) { _interlockedbittestandreset64( (__int64*)protected, 0 ); } void worker( void *arg ) { args *a = (args*)arg; int i = 0; for( i=0; i < a->loops; ++i ) { lock( a->protected ); *( (int*)a->protected ) += 2; unlock( a->protected ); } return; } void main( int argc, char **argv ) { int i = 0, nThreads = 4; clock_t start, finish; double elapsed; uintptr_t threads[32]; int shared = 0; args a = { (void *)&shared, 1000000 };; if( argc > 1 ) nThreads = atol( argv[1] ); if( argc > 2 ) a.loops = atol( argv[2] ); printf( "threads:%d loops:%d\n", nThreads, a.loops ); start = clock(); for( i=0; i < nThreads; ++i ) threads[ i ] = _beginthread( &worker, 0, &a ); WaitForMultipleObjects( nThreads, (HANDLE*)&threads, 1, INFINITE ) +; finish = clock(); elapsed = (double)(finish - start) / CLOCKS_PER_SEC; printf( "count: %lu time:%.6f\n", shared, elapsed ); }

        And a run with 32 threads all contending to add 2 to a shared integer 1 million times each:

        C:\test\lockfree>bitlock 32 1000000 threads:32 loops:1000000 count: 64000000 time:1.332000

        And implemented using the simplest primitive possible -- one that will be available in some form on any modern processor.

        The other "overhead" you show is so locking can block, so Perl can have a single interface over multiple implementations of blocking locking on multiple operating systems, so Linux can implement a portable blocking locking interface over the current choice of kernel implementation, so Linux can detect deadlock loops, so Linux users can select different types of "wake order" behavior, so somebody can adjust "spin" to reduce context switches in certain scenarios, etc

        And therein lies the rub. Perl implements it own recursive locking in terms of pthreads 0.1 primitives. But those "speced" pthreads primitives have long since been superseded on every modern *nix system by vastly more efficient effective and flexible primitives -- eg. futexes -- which already have recursive capabilities.

        And then on other platforms -- ie. windows -- the pthreads 0.1 primitives are clumsily emulated using oldest, least effective OS primitives.

        Everyone, everywhere is getting big, slow, clumsy emulations of a defunct standard instead of being able to use the modern, efficient, effective mechanisms that have evolved since the pthreads api was frozen in stone.

        And all those "so Linux users can" and "so somebody can" are pie-in-the sky, what-ifs and maybes that can never happen for perl users anywhere. Typical, lowest common denominator stuff.


        With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
        Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
        "Science is about questioning the status quo. Questioning authority".
        In the absence of evidence, opinion is indistinguishable from prejudice.

        The start of some sanity?

    A reply falls below the community's threshold of quality. You may see it by logging in.
Re^4: Recursive locks:killer application. Do they have one? (mu)
by BrowserUk (Patriarch) on Jul 16, 2013 at 06:13 UTC
    I find it hard to imagine how "need to count" can have more than the most trivial of impacts on the efficiency of a mutex.

    Independent proof Suck it!

    You are almost as bad as sundialsvc4; unfortunately, getting enough monks around here to recognise it is going to be an awful lot harder.


    With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.

      Too bad that your link doesn't actually contradict my point.

      The need to hit the kernel is not "the need to count".

      - tye