Beefy Boxes and Bandwidth Generously Provided by pair Networks
laziness, impatience, and hubris

Re^3: sorting type question- space problems

by mbethke (Hermit)
on Sep 14, 2013 at 19:42 UTC ( [id://1054130]=note: print w/replies, xml ) Need Help??

in reply to Re^2: sorting type question- space problems
in thread sorting type question- space problems

First, RickardK's solution of using sort(1) makes sense. If it's there, use it, as it's written in C, highly optimized and well tested.

That said, if your keys tend to be small compared to the rest of the records, in-memory sort may be feasible by recording only the keys and the file offsets of their respective lines:

my @a; open my $fh, "<","input" or die $!; while(1) { last if eof($fh); my $pos = tell($fh); my ($k1,$k2) = split /\s+/, <$fh>; push @a, [$k1, $k2, $pos]; } foreach(sort { $a->[0] cmp $b->[0] or $a->[1] cmp $b->[1]} @a) { seek($fh, $_->[2], 0); print scalar <$fh>; }

Probably not the fastest, but if you want to avoid external sorts both in the sense of shelling out and tempfiles, it may be worth a try. At 5M lines it will likely break the 100 MB limit but without tricks like assuming something about characters that don't occur in the keys and thus encoding all the keys and offsets in one string it's unlikely to get much smaller.

Log In?

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://1054130]
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others rifling through the Monastery: (3)
As of 2024-06-17 13:18 GMT
Find Nodes?
    Voting Booth?

    No recent polls found

    erzuuli‥ 🛈The London Perl and Raku Workshop takes place on 26th Oct 2024. If your company depends on Perl, please consider sponsoring and/or attending.