|Pathologically Eclectic Rubbish Lister|
Re^4: Threaded Code Not Faster Than Non-Threaded -- Why?by Tommy (Chaplain)
|on Jan 05, 2014 at 18:32 UTC||Need Help??|
Hm. Unless there is some new or modified documentation kicking around that I've not seen, I think you have misinterpreted the documentation. I am not aware of any docs that suggest the queue architecture you are using.
As mentioned in an above reply...
...I provided the wrong link. The code I wrote comes from this code taken directly from the examples directory of the threads CPAN distro by JDHEDDEN. It's called pool_reuse.pl
So I can't take full credit for it. I wouldn't have thought to use a queue for each worker. It didn't make sense to me. I figured it was a safe assumption that they guy who wrote threads knew what he was doing and so I took a leap and used the same (almost line for line) code as the core of my threading setup. After all, his documentation said, (I summarize) that if you don't have a reusable thread pool then you are probably going to leak RAM badly, and this code is how you solve that problem.
Now all kinds of holes have been shot through it, and this is a good thing for me. I'll never get anywhere in speeding up the code by paying no heed to the problems with it. I'm quite delighted to see how I've erred, so that I can improve it.
Although this may sound like polling, it is quite different in that there is no single event or synchronisation involved. That is, it is not waiting for the queue length to fall to some exact number; but rather just until it falls below some number. Provided that number is sufficient to ensure that no worker is ever blocked waiting for a new item, this ensures that the workers run at their full potential speed without risk memory overruns by over stuffing the queue.
Very interesting. It seems like it will take a bit of experimentation to figure out how long to sleep and how deep to keep the queue.
That does away with the separate, per-worker item queues and the requests (tid) queue.
...And that seems like a lot of overhead, more so every time I read over this. I'm eager to see the effects of (re)moving things around in the code to accomplish this.
Ultimately, the whole thing is IO-constrained...
Yes. Precisely. The number crunching involved in calculating digests is what I really wanted to spread across cores. The IO isn't going to go faster because of the threading, but the digest processing does. If I can eliminate the overhead I've introduced into my own code through inefficient thread management, I'm sure I'll see noticeable improvements. By altering my approach even more, I might be able to cut down on the amount of digesting ever performed. This makes the threading even less valuable in the exercise, but not in the lesson learned.
Thank you BrowserUk
A mistake can be valuable or costly, depending on how faithfully you pursue correction