P is for Practical | |
PerlMonks |
comment on |
( [id://3333]=superdoc: print w/replies, xml ) | Need Help?? |
Not to be argumentative, but are you sure you need to sort this list?
If your goal were to identify duplicates in the list, you could take a hash (MD5, not %hash) of each list element, sort the hashes, and look for duplicates in the hash space. A hash function like MD5 or SHA1 should reliably distinguish 2 GB strings, but if you do find duplicates, you could always verify them from the primary data. If these strings are expected to be dissimilar, it may suffice to sort them based on their first 1000 characters and then use a separate procedure to deal with cases where the first 1000 characters are identical. This seems like a time to step back and think about the larger goal and any a priori knowledge of the data to be sorted. In reply to Re: Sorting Gigabytes of Strings Without Storing Them
by eye
|
|