, your code is likely to be faster not so much because it shaves away Perl cycles but because it will greatly reduce disk seeks, which are probably dominating L~R's run time. (See my other post in this thread for more on this.)
L~R: Assuming that you have the RAM, can you compare tachyon's code's run time to the other implementations?
My guess is that tachyon's code will fare well. (If you don't have the RAM, just tweak the code so that it will process, say, 100_000 or so lines per pass and clear out %fh between passes. Also, you'll need to open files in append mode.)