http://www.perlmonks.org?node_id=248501


in reply to Re: Re: Heap sorting in perl
in thread Heap sorting in perl

The original question didn't specify how the data would be available, so I'm just going to assume it fits in memory. Note that a heapsort only requires a constant amount of memory additional to the data to be sorted.

If the data doesn't fit in memory, then that's a different problem. One can always tie the used array to whatever solution will keep the data to be worked on out of memory.

Abigail

Replies are listed 'Best First'.
Re: Re: Heap sorting in perl
by Anonymous Monk on Apr 07, 2003 at 03:40 UTC
      Yeah, whatever. Here's a modification of the program. Give the number of elements to extract as a command line argument, and the elements to extract from on STDIN. It assumes that there are enough elements on STDIN.

      #!/usr/bin/perl use strict; use warnings; my @heap; my $M = @ARGV ? shift : 10; sub heapify; sub heapify { my $idx = shift; my $max = $idx; for my $try (2 * $idx + 1, 2 * $idx + 2) { $max = $try if $try < @heap && $heap [$try] > $heap [$max] } return if $max == $idx; @heap [$idx, $max] = @heap [$max, $idx]; heapify $max; } sub extract_max { return unless @heap; my $min = $heap [0]; my $tmp = pop @heap; if (@heap) { $heap [0] = $tmp; heapify 0; } return $min; } # First load the heap with initial data. for (1 .. $M) { push @heap => scalar <>; } chomp @heap; # # Heapify it. # for (my $i = int (@heap / 2); $i --;) { heapify $i; } # # Deal with the rest. # while (<>) { chomp; next if $_ >= $heap [0]; # It's too large. $heap [0] = $_; heapify 0; } my @result; push @result => extract_max while @heap; @result = reverse @result; print "@result\n"; __END__

      The running time of this is O (N log k), using O (k) memory.

      Abigail

        Every so often I really wish that I had learned more about algorithms. This is one of those times.

        Beautiful.

      In said note, blakem writes:
      Heap sorting is attractive because selecting the smallest M elements from a dataset of size N (N being much larger than M) requires storing only M elements in memory at any one time.
      That means that he can very well store the entire heap, the M elements, in memory.

      Makeshifts last the longest.

        Abigail's original solution as I read it requires that the entire dataset of size N be passed into the function smallest(). This requires that you have enough memory for N things up front. Which is the original poster's problem, he doesn't. He only comfortably has room for M things, but is fetching through a much larger dataset of N.

        Abigail's follow-up solution resolves this limitation admirably well...