Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl Monk, Perl Meditation

Re: to thread or fork or ?

by mbethke (Hermit)
on Oct 19, 2012 at 02:25 UTC ( #999847=note: print w/ replies, xml ) Need Help??

in reply to to thread or fork or ?

Unless the work you do is A LOT more expensive than a simple frequency count, or your data set extremely large and your disks very fast, you're best off using a single process. Synchronization is a lot of work with threads so accessing say a shared hash can be orders of magnitude slower than a non-shared one; with processes you'd end up having to serialize the end result and somehow pipe it back to the master process---also very expensive.

If you're sure you want this (maybe just as a learning experience), I'd suggest just using fixed-size chunks from the input stream per process/thread, to minimize shared data. Say, read a couple of megabytes plus a line into a single string (so as to read at maximum speed, plus the line so you don't split your work in the middle of a word), then start a thread to process it (split into words, optionally normalize, count) into a local hash that then goes into a queue read by the master thread that checks for results from worker threads in-between blocks.

Comment on Re: to thread or fork or ?

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://999847]
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others rifling through the Monastery: (14)
As of 2014-07-28 15:13 GMT
Find Nodes?
    Voting Booth?

    My favorite superfluous repetitious redundant duplicative phrase is:

    Results (201 votes), past polls