Beefy Boxes and Bandwidth Generously Provided by pair Networks
laziness, impatience, and hubris

Re: Processing large file using threads

by renodino (Curate)
on May 08, 2007 at 15:30 UTC ( #614174=note: print w/ replies, xml ) Need Help??

in reply to Processing large file using threads

  1. Create a "master" Thread (usually the root thread).
  2. Create some (possibly configurable) number of child threads
  3. (Here's the tricky part). You've got a couple of alternatives:
    • a) on thread->create(), pass the fileno() to each child thread, along with an offset and skip count. Each child thread then does an open(INF, '<&', $fileno) on the fileno, reads/discards skip count lines, then iteratively read/process/ skip until EOF
    • b) alternately, create 2 Thread::Queues (one from master to children, the other from children to master). Master reads each line from file and posts to downstream queue; children grab (randomly) a line off the queue, process it, then post a response to the upstream queue.

As ever, TIMTOWTDI. (b) is probably simpler, but (a) is more deterministic. Both can be mimiced using a process based approach.

I've successfully used (a) for ETL tools, but if your file is binary/random access, it can get complicated to skip records. Also, if the children are writing to an output file, (b) might be easier to let a single master thread do the writing instead of coordinating writes betwen children.

Perl Contrarian & SQL fanboy

Comment on Re: Processing large file using threads

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://614174]
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others romping around the Monastery: (10)
As of 2015-11-25 14:42 GMT
Find Nodes?
    Voting Booth?

    What would be the most significant thing to happen if a rope (or wire) tied the Earth and the Moon together?

    Results (679 votes), past polls