in reply to Processing large file using threads
- Create a "master" Thread (usually the root thread).
- Create some (possibly configurable) number of child threads
- (Here's the tricky part). You've got a couple of alternatives:
- a) on thread->create(), pass the fileno() to each child thread, along with an
offset and skip count. Each child thread then does an open(INF, '<&', $fileno)
on the fileno, reads/discards skip count lines, then iteratively read/process/
skip until EOF
- b) alternately, create 2 Thread::Queues (one from master to children,
the other from children to master). Master reads each line from file
and posts to downstream queue; children grab (randomly) a line off
the queue, process it, then post a response to the upstream queue.
As ever, TIMTOWTDI. (b) is probably simpler, but (a) is more deterministic.
Both can be mimiced using a process based approach.
I've successfully used (a) for ETL tools, but if your file is
binary/random access, it can get complicated to skip records.
Also, if the children are writing to an output file, (b)
might be easier to let a single master thread do the writing
instead of coordinating writes betwen children.
Perl Contrarian & SQL fanboy