I agree that multiple cores will not help because there is a blocking point of the sequential read of the input file.
I am not so sure about output buffering.
I really don't know in this situation, but depending upon the file system and other factors like the intelligence of the disk controller, increasing the buffer size for write could make a difference?
Just an idea to try. I would benchmark 64K vs standard size (which I guess is probably 4K) and see if there is any significant difference.