RE: time lag... Yes, acquiring a lock does introduce a very slight time lag...but this is very, very fast. The OS keeps track who who has what lock in a memory resident structure. To use this method, you should not acquire the lock until you are actually ready to write. That means to capture the output in a memory variable and then lock,write,unlock in quick succession rather than having the lock for the entire time that the sub-process is running. It is possible to sequence even very high I/O rates with this method because the time for each write is negligible.
Having each child use its own individual file and then "cat" them together when all of them are finished is another way and is the way I'd do it if I was launching these tasks in the background via a shell script.
It looks like you are using method 2, which is fine. Either way will work for your application.