A few thoughts
- Your sample data set wasn't very large. Would it be possible for you to make even smaller sort keys? Getting the memory footprint of @sorton a bit smaller might help. Since the sort seems to be the killer, and not the preparation time, you might want to spend more time prepping the data.
- Can you do that by making a completely numeric sort key? Since, except for bracketed field ([]), everything else seems to be numeric. Then you wouldn't need an alphabetic comparison here. Packing things down into a smaller value might prove useful. Watch for overflows.
- At the very least, can you get that bracketed field down to a couple of characters instead of a long string?
- How many records _are_ we talking about? Not "100MB" but, how many lines of data?
- (tiny optimization)Can you pack the offset into the sort key as well? Your sort would be sort @sorton instead of needing the indirection there.
Kinda makes me wish I had the actual file to try this out....