|Problems? Is your data what you think it is?|
Re: sorting very large text filesby gam3 (Curate)
|on Jan 06, 2010 at 03:40 UTC||Need Help??|
If the file is not so big that the keys will not fit into memory you can do this:
On the cooked data I tested, I got the following timings:
Gnu Sort: # time sort --temporary-directory=/opt data > sort1 real 0m24.698s user 0m22.539s sys 0m1.950s Perl Sort: # time perl sort.pl real 0m55.900s user 0m39.897s sys 0m6.430sThe data file I used had a wc of:
#wc data 4915200 34406400 383385600 dataI am surprised that this Perl script is only half the speed of Gnu sort on this data. I think that on a bigger data set, with long lines, it might even be able to sort faster that Gnu Sort.
UPDATE: Most of the time seems to be being spent in the output loop. All of the seeks seems to really slow things down.
A picture is worth a thousand words, but takes 200K.