Beefy Boxes and Bandwidth Generously Provided by pair Networks
Welcome to the Monastery
 
PerlMonks  

Re: to thread or fork or ?

by mbethke (Hermit)
on Oct 19, 2012 at 02:25 UTC ( #999847=note: print w/ replies, xml ) Need Help??


in reply to to thread or fork or ?

Unless the work you do is A LOT more expensive than a simple frequency count, or your data set extremely large and your disks very fast, you're best off using a single process. Synchronization is a lot of work with threads so accessing say a shared hash can be orders of magnitude slower than a non-shared one; with processes you'd end up having to serialize the end result and somehow pipe it back to the master process---also very expensive.

If you're sure you want this (maybe just as a learning experience), I'd suggest just using fixed-size chunks from the input stream per process/thread, to minimize shared data. Say, read a couple of megabytes plus a line into a single string (so as to read at maximum speed, plus the line so you don't split your work in the middle of a word), then start a thread to process it (split into words, optionally normalize, count) into a local hash that then goes into a queue read by the master thread that checks for results from worker threads in-between blocks.


Comment on Re: to thread or fork or ?

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://999847]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others contemplating the Monastery: (5)
As of 2014-08-30 00:54 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    The best computer themed movie is:











    Results (290 votes), past polls