The construction of the arrays from the split lines consumes a fair amount of CPU. As does the allocation of the buffers for reading into and writing from. As does the searching of the 4K buffers to locate the line ends. And the search of the lines to find the tabs.
But still, the minimum elapsed time is dominated by the time it takes to a) read 80 GB from disk; whilst simultaneously writing ~5 GB back to disk. On my system 2.6 GHz 4-core Q6600 + 7200 RPM SATA2 disk with 16MB ram, that reading and writing process -- with no other processing; searching; memory allocations etc. -- requires a minimum of 56 minutes.
The difference between that and your 5 hours comes in two parts.
If you can overlap the IO and the internal processing, you can claw back some of that difference.
Have you tried my two processes solution?