Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl-Sensitive Sunglasses
 
PerlMonks  

Re^2: Threaded Code Not Faster Than Non-Threaded -- Why? (meat)

by Tommy (Chaplain)
on Jan 05, 2014 at 05:24 UTC ( #1069358=note: print w/ replies, xml ) Need Help??


in reply to Re: Threaded Code Not Faster Than Non-Threaded -- Why? (meat)
in thread Threaded Code Not Faster Than Non-Threaded -- Why?

I don't understand this part , I think you have too many queues, and you should probably have two, one for files to process , and one for results of that processing

It comes straight from here. Which came straight from here. If my implementation of that core documentation code is flawed, I really want to see an implementation that isn't. I'm totally serious. I want to learn how to do it right.

Tommy
A mistake can be valuable or costly, depending on how faithfully you pursue correction


Comment on Re^2: Threaded Code Not Faster Than Non-Threaded -- Why? (meat)
Re^3: Threaded Code Not Faster Than Non-Threaded -- Why? (meat)
by Preceptor (Chaplain) on Jan 05, 2014 at 13:01 UTC

    I think you're misreading it - it includes examples of creating queues, but I don't see it implying that you need multiple queues.

    I have an example of a 'queue based' worker thread model: A basic 'worker' threading model

    Personally I'd be thinking in terms of using 'File::Find' to traverse your filesystem linearly, but have if feed a queue with files that need more detailed inspection. The two most expensive operations in this process are: Filesystem traversal (which is hard to optimise without messing with disks and filesystem layout). Also - reading the files for calculating their hashes - the reading of files may well be more 'expensive' than doing the sums. My thought would be to ask if you can do partial hashes, iteratively - if you work through a file say 'one block' at a time (varies depending on filesystem) you have a single read IO operation, that you then hash - and can work through a file if it's longer, until hashes don't match. If the file is a genuine dupe, then you'll still have to read the whole lot, but if it's not it'll discard faster.

      Thanks for the example! I'll check it out.

      I'm sorry I provided the wrong link. The code I wrote comes from this code taken directly from the examples directory of the threads CPAN distro by JDHEDDEN. It's called pool_reuse.pl

      The block by block comparison of files which you proposed is actually part of my next approach. I may be able to forgo the need to digest the file content altogether and get a real speed boost by only reading as many bytes from a file as I need to in order to tell that it's different. Much less IO required.

      Tommy
      A mistake can be valuable or costly, depending on how faithfully you pursue correction

      Second reply: By way of follow up I wanted to thank you Preceptor for the informative link. Also I didn't respond to your comment about File::Find. I didn't use it because it is slower than File::Util directory traversal in my own tests.

      I have forked the reference code for the hackathon and implemented the kind of threading consistent with your code example. More to come...I'm benching it right now.

      Tommy
      A mistake can be valuable or costly, depending on how faithfully you pursue correction

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://1069358]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others imbibing at the Monastery: (3)
As of 2015-07-06 01:19 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    The top three priorities of my open tasks are (in descending order of likelihood to be worked on) ...









    Results (68 votes), past polls