Beefy Boxes and Bandwidth Generously Provided by pair Networks
"be consistent"
 
PerlMonks  

Re: Processing large file using threads

by renodino (Curate)
on May 08, 2007 at 15:30 UTC ( #614174=note: print w/ replies, xml ) Need Help??


in reply to Processing large file using threads

  1. Create a "master" Thread (usually the root thread).
  2. Create some (possibly configurable) number of child threads
  3. (Here's the tricky part). You've got a couple of alternatives:
    • a) on thread->create(), pass the fileno() to each child thread, along with an offset and skip count. Each child thread then does an open(INF, '<&', $fileno) on the fileno, reads/discards skip count lines, then iteratively read/process/ skip until EOF
    • b) alternately, create 2 Thread::Queues (one from master to children, the other from children to master). Master reads each line from file and posts to downstream queue; children grab (randomly) a line off the queue, process it, then post a response to the upstream queue.

As ever, TIMTOWTDI. (b) is probably simpler, but (a) is more deterministic. Both can be mimiced using a process based approach.

I've successfully used (a) for ETL tools, but if your file is binary/random access, it can get complicated to skip records. Also, if the children are writing to an output file, (b) might be easier to let a single master thread do the writing instead of coordinating writes betwen children.


Perl Contrarian & SQL fanboy


Comment on Re: Processing large file using threads

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://614174]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others taking refuge in the Monastery: (7)
As of 2015-07-07 06:00 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    The top three priorities of my open tasks are (in descending order of likelihood to be worked on) ...









    Results (87 votes), past polls