In my humble opinion, your problem is still that the various threads will be competing with one another. Very likely, you do not need threads at all ... very likely, you do not need any of this complexity ... and this just might be what these results are trying to tell you!
I suggest that you would be far better served, either by sticking with the one-process approach which (still) seems to be giving you fairly decent performance, or by modifying the (non-threaded) program so that it scans the input file for the records that belong in a particular database (identified, say, by a command-line parameter ...), and then, if you wish, running multiple instances of this program concurrently – say, by using the "&" feature of the Unix/Linux shell.
In the most friendly way possible, I suggest that you’ve built-up all this concurrency-stuff on the expectation that it would save you time, when most-clearly it is not doing so. And, if an approach is not panning-out, the time comes to just let it go. Personally, in this case, I think that time has come.