in reply to Re: Heap sorting in perl
in thread Heap sorting in perl

Bad assumption. As already clarified at Re: Heap sorting in perl, memory is exactly the problem.

Replies are listed 'Best First'.
Re: Heap sorting in perl
by Abigail-II (Bishop) on Apr 07, 2003 at 11:20 UTC
    Yeah, whatever. Here's a modification of the program. Give the number of elements to extract as a command line argument, and the elements to extract from on STDIN. It assumes that there are enough elements on STDIN.

    #!/usr/bin/perl use strict; use warnings; my @heap; my $M = @ARGV ? shift : 10; sub heapify; sub heapify { my $idx = shift; my $max = $idx; for my $try (2 * $idx + 1, 2 * $idx + 2) { $max = $try if $try < @heap && $heap [$try] > $heap [$max] } return if $max == $idx; @heap [$idx, $max] = @heap [$max, $idx]; heapify $max; } sub extract_max { return unless @heap; my $min = $heap [0]; my $tmp = pop @heap; if (@heap) { $heap [0] = $tmp; heapify 0; } return $min; } # First load the heap with initial data. for (1 .. $M) { push @heap => scalar <>; } chomp @heap; # # Heapify it. # for (my $i = int (@heap / 2); $i --;) { heapify $i; } # # Deal with the rest. # while (<>) { chomp; next if $_ >= $heap [0]; # It's too large. $heap [0] = $_; heapify 0; } my @result; push @result => extract_max while @heap; @result = reverse @result; print "@result\n"; __END__

    The running time of this is O (N log k), using O (k) memory.


      Every so often I really wish that I had learned more about algorithms. This is one of those times.


Re^5: Heap sorting in perl
by Aristotle (Chancellor) on Apr 07, 2003 at 08:49 UTC
    In said note, blakem writes:
    Heap sorting is attractive because selecting the smallest M elements from a dataset of size N (N being much larger than M) requires storing only M elements in memory at any one time.
    That means that he can very well store the entire heap, the M elements, in memory.

    Makeshifts last the longest.

      Abigail's original solution as I read it requires that the entire dataset of size N be passed into the function smallest(). This requires that you have enough memory for N things up front. Which is the original poster's problem, he doesn't. He only comfortably has room for M things, but is fetching through a much larger dataset of N.

      Abigail's follow-up solution resolves this limitation admirably well...