Thank you very much, indeed, for pointing out about flaw in my code; I think I _do_ see what you mean by
It's possible to execute reach $_->down for @s; in one thread before reaching $s[ $id ]->down; in another thread
and, thus, the necessity of separate semaphore. Buggy code and flawed logic could cost me in the future. Thanks!!! Still, however, it won't happen in my SSCCE, nor in my real code, because of amount of time required for executing "payload" (and limited number of threads?). Here's result of one (out of about 10?) invocation of _exactly_ your code (the 1st one), it's the same issue:
first, 0 1.826
first, 2 1.834
first, 1 1.834
first, 3 1.836
next, 1 1.835
next, 0 1.829
next, 2 1.835
next, 3 1.836
However, using clock_gettime(CLOCK_MONOTONIC) it's looks like it's _always_ OK, and, FWIW, always the _same_ time reported for all of "next"'s.
first, 3 1.855
first, 1 1.856
first, 0 1.858
first, 2 1.867
next, 0 1.868
next, 1 1.868
next, 3 1.868
next, 2 1.868