Beefy Boxes and Bandwidth Generously Provided by pair Networks
The stupid question is the question not asked
 
PerlMonks  

Re: Re: Heap sorting in perl

by Anonymous Monk
on Apr 06, 2003 at 23:59 UTC ( #248499=note: print w/replies, xml ) Need Help??


in reply to Re: Heap sorting in perl
in thread Heap sorting in perl

Doesn't this require the whole list to be in memory? And doesn't that create a problem with the "from a very large dataset"?

Replies are listed 'Best First'.
Re: Heap sorting in perl
by Abigail-II (Bishop) on Apr 07, 2003 at 00:15 UTC
    The original question didn't specify how the data would be available, so I'm just going to assume it fits in memory. Note that a heapsort only requires a constant amount of memory additional to the data to be sorted.

    If the data doesn't fit in memory, then that's a different problem. One can always tie the used array to whatever solution will keep the data to be worked on out of memory.

    Abigail

        Yeah, whatever. Here's a modification of the program. Give the number of elements to extract as a command line argument, and the elements to extract from on STDIN. It assumes that there are enough elements on STDIN.

        #!/usr/bin/perl use strict; use warnings; my @heap; my $M = @ARGV ? shift : 10; sub heapify; sub heapify { my $idx = shift; my $max = $idx; for my $try (2 * $idx + 1, 2 * $idx + 2) { $max = $try if $try < @heap && $heap [$try] > $heap [$max] } return if $max == $idx; @heap [$idx, $max] = @heap [$max, $idx]; heapify $max; } sub extract_max { return unless @heap; my $min = $heap [0]; my $tmp = pop @heap; if (@heap) { $heap [0] = $tmp; heapify 0; } return $min; } # First load the heap with initial data. for (1 .. $M) { push @heap => scalar <>; } chomp @heap; # # Heapify it. # for (my $i = int (@heap / 2); $i --;) { heapify $i; } # # Deal with the rest. # while (<>) { chomp; next if $_ >= $heap [0]; # It's too large. $heap [0] = $_; heapify 0; } my @result; push @result => extract_max while @heap; @result = reverse @result; print "@result\n"; __END__

        The running time of this is O (N log k), using O (k) memory.

        Abigail

        In said note, blakem writes:
        Heap sorting is attractive because selecting the smallest M elements from a dataset of size N (N being much larger than M) requires storing only M elements in memory at any one time.
        That means that he can very well store the entire heap, the M elements, in memory.

        Makeshifts last the longest.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://248499]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others perusing the Monastery: (4)
As of 2020-11-25 07:19 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found

    Notices?