Beefy Boxes and Bandwidth Generously Provided by pair Networks
laziness, impatience, and hubris
 
PerlMonks  

Re: Processing large file using threads

by renodino (Curate)
on May 08, 2007 at 15:30 UTC ( #614174=note: print w/ replies, xml ) Need Help??


in reply to Processing large file using threads

  1. Create a "master" Thread (usually the root thread).
  2. Create some (possibly configurable) number of child threads
  3. (Here's the tricky part). You've got a couple of alternatives:
    • a) on thread->create(), pass the fileno() to each child thread, along with an offset and skip count. Each child thread then does an open(INF, '<&', $fileno) on the fileno, reads/discards skip count lines, then iteratively read/process/ skip until EOF
    • b) alternately, create 2 Thread::Queues (one from master to children, the other from children to master). Master reads each line from file and posts to downstream queue; children grab (randomly) a line off the queue, process it, then post a response to the upstream queue.

As ever, TIMTOWTDI. (b) is probably simpler, but (a) is more deterministic. Both can be mimiced using a process based approach.

I've successfully used (a) for ETL tools, but if your file is binary/random access, it can get complicated to skip records. Also, if the children are writing to an output file, (b) might be easier to let a single master thread do the writing instead of coordinating writes betwen children.


Perl Contrarian & SQL fanboy


Comment on Re: Processing large file using threads

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://614174]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others imbibing at the Monastery: (8)
As of 2014-12-17 22:08 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    Is guessing a good strategy for surviving in the IT business?





    Results (36 votes), past polls