Beefy Boxes and Bandwidth Generously Provided by pair Networks
more useful options
 
PerlMonks  

Re: Processing large file using threads

by renodino (Curate)
on May 08, 2007 at 15:30 UTC ( [id://614174]=note: print w/replies, xml ) Need Help??


in reply to Processing large file using threads

  1. Create a "master" Thread (usually the root thread).
  2. Create some (possibly configurable) number of child threads
  3. (Here's the tricky part). You've got a couple of alternatives:
    • a) on thread->create(), pass the fileno() to each child thread, along with an offset and skip count. Each child thread then does an open(INF, '<&', $fileno) on the fileno, reads/discards skip count lines, then iteratively read/process/ skip until EOF
    • b) alternately, create 2 Thread::Queues (one from master to children, the other from children to master). Master reads each line from file and posts to downstream queue; children grab (randomly) a line off the queue, process it, then post a response to the upstream queue.

As ever, TIMTOWTDI. (b) is probably simpler, but (a) is more deterministic. Both can be mimiced using a process based approach.

I've successfully used (a) for ETL tools, but if your file is binary/random access, it can get complicated to skip records. Also, if the children are writing to an output file, (b) might be easier to let a single master thread do the writing instead of coordinating writes betwen children.


Perl Contrarian & SQL fanboy
  • Comment on Re: Processing large file using threads

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://614174]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others taking refuge in the Monastery: (3)
As of 2024-04-20 01:43 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found