Beefy Boxes and Bandwidth Generously Provided by pair Networks
Clear questions and runnable code
get the best and fastest answer

Re: Processing large file using threads

by renodino (Curate)
on May 08, 2007 at 15:30 UTC ( #614174=note: print w/replies, xml ) Need Help??

in reply to Processing large file using threads

  1. Create a "master" Thread (usually the root thread).
  2. Create some (possibly configurable) number of child threads
  3. (Here's the tricky part). You've got a couple of alternatives:
    • a) on thread->create(), pass the fileno() to each child thread, along with an offset and skip count. Each child thread then does an open(INF, '<&', $fileno) on the fileno, reads/discards skip count lines, then iteratively read/process/ skip until EOF
    • b) alternately, create 2 Thread::Queues (one from master to children, the other from children to master). Master reads each line from file and posts to downstream queue; children grab (randomly) a line off the queue, process it, then post a response to the upstream queue.

As ever, TIMTOWTDI. (b) is probably simpler, but (a) is more deterministic. Both can be mimiced using a process based approach.

I've successfully used (a) for ETL tools, but if your file is binary/random access, it can get complicated to skip records. Also, if the children are writing to an output file, (b) might be easier to let a single master thread do the writing instead of coordinating writes betwen children.

Perl Contrarian & SQL fanboy
  • Comment on Re: Processing large file using threads

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://614174]
and all is quiet...

How do I use this? | Other CB clients
Other Users?
Others romping around the Monastery: (2)
As of 2018-04-25 19:11 GMT
Find Nodes?
    Voting Booth?