If you want to go the thread way:
in reply to How do you parallelize STDIN for large file processing?
Create two queues - one for read data and one for write data.(Thread::Queue)
Start x workers as detached (I would make this a parameter) and a result writer also detached.(threads)
Workers read from the inqueue and send results to the result queue.
Program terminates either when lines read = lines written or when the result queue is empty for x seconds.
The main question is the bottleneck, but I have used this approach quite frequently on multicore machines with success.