Beefy Boxes and Bandwidth Generously Provided by pair Networks
We don't bite newbies here... much
 
PerlMonks  

Re: to thread or fork or ?

by mbethke (Hermit)
on Oct 19, 2012 at 02:25 UTC ( #999847=note: print w/replies, xml ) Need Help??


in reply to to thread or fork or ?

Unless the work you do is A LOT more expensive than a simple frequency count, or your data set extremely large and your disks very fast, you're best off using a single process. Synchronization is a lot of work with threads so accessing say a shared hash can be orders of magnitude slower than a non-shared one; with processes you'd end up having to serialize the end result and somehow pipe it back to the master process---also very expensive.

If you're sure you want this (maybe just as a learning experience), I'd suggest just using fixed-size chunks from the input stream per process/thread, to minimize shared data. Say, read a couple of megabytes plus a line into a single string (so as to read at maximum speed, plus the line so you don't split your work in the middle of a word), then start a thread to process it (split into words, optionally normalize, count) into a local hash that then goes into a queue read by the master thread that checks for results from worker threads in-between blocks.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://999847]
help
Chatterbox?
[Corion]: Yeah, but the hope is that when I just delete it I won't receive any more mail from them
[Corion]: ... a faint hope, I know
[Discipulus]: ah ah you are still so optimist, Corion ? ;=)
[Discipulus]: here we are a bit late: i'll inject a ransomware to crypt everything and be compliant

How do I use this? | Other CB clients
Other Users?
Others examining the Monastery: (7)
As of 2018-05-25 07:34 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?
    Notices?