http://www.perlmonks.org?node_id=249519


in reply to speed up one-line "sort|uniq -c" perl code

I know this isn't really the answer you were looking for, but I think you should consider getting a better sort program. I don't think there are any limitations on the GNU one, and it is typically faster than doing the same thing in Perl.
  • Comment on Re: speed up one-line "sort|uniq -c" perl code

Replies are listed 'Best First'.
Re^2: speed up one-line "sort|uniq -c" perl code (speed)
by tye (Sage) on Apr 10, 2003 at 16:47 UTC

    sort needs both time and space to perform the sort no matter how cleverly implemented. I find it hard to imagine a system that is so poorly configured that it can't handle sorting a paultry 500kB file. But I don't think that really matters in this particular case.

    There is a reason that "sort -u" came to be. It is much slower to sort all 57000 instances of several IPs and then throw all but one of each away. So I think "sort | uniq -c" would be much slower than using Perl.

    Unfortunately, it doesn't appear that even GNU sort has bothered to implement a -u option that counts the duplicates.

                    - tye
      Thanks for making me realize a typo.
      The file that I am parsing is 500MB, not 500kB....
      That's why sort freaks out.