Beefy Boxes and Bandwidth Generously Provided by pair Networks
Come for the quick hacks, stay for the epiphanies.
 
PerlMonks  

Re: Re: Heap sorting in perl

by Anonymous Monk
on Apr 05, 2003 at 15:26 UTC ( [id://248299]=note: print w/replies, xml ) Need Help??


in reply to Re: Heap sorting in perl
in thread Heap sorting in perl

The reason that someone with a large dataset might want a heap is that they don't want to have very much of the dataset in memory at once. With a Heap you can add an element at a time until you have your limit, and from there on you can add one and remove the biggest each time.

When you are done you can then just extract off all of the elements in the heap, and you have the smallest N of them from largest to smallest. Without excessive memory usage.

Replies are listed 'Best First'.
Re^3: Heap sorting in perl
by adrianh (Chancellor) on Apr 05, 2003 at 15:40 UTC

    Erm. No :-)

    You have to add all the items from the dataset to the heap before you can remove the N lowest (or highest, depending on the direction your grow your tree).

    Yes, you could have the heap as an out-of-memory structure. However, if the dataset is on disk you can just read it in element by element and use the algorithm I proposed. It's still going to be less expensive in time and space than creating a heap.

    Unless you are goin to be adding and removing entries from the data set and need to keep it ordered a heap is overkill for the problem as stated.

      I don't follow your reasoning.

      If you know when you start how many you will want in the end, what would possibly prevent you from adding and removing from your heap before you had all of the data in? Sure, the first "maximum" removed and thrown away might not be the biggest element in the whole set. But it wasn't in the smallest N, and that is all we care about. Who cares what order we throw away the rest in?? (As long as we throw it away before running out of memory!)

      As for overkill, using Perl to solve a problem that can be solved with the correct option to the Unix sort utility is overkill. But if you need a general purpose solution in Perl and you have a heap implementation there already, why not use it?

        I don't follow your reasoning.

        We were talking at cross purposes. I thought that you were proposing to heap sort the entire dataset, not the subset of smallest elements.

        I admit that my using sort to order the smallest subset is a hack, but unless N is large or the comparison function is expensive, it will be faster than a pure-perl heap.

        As for overkill, using Perl to solve a problem that can be solved with the correct option to the Unix sort utility is overkill. But if you need a general purpose solution in Perl and you have a heap implementation there already, why not use it?

        I don't see how the Unix sort applies. We don't want to sort the whole dataset - we want the smallest N elements (and we also don't know if the dataset consists of line oriented data.)

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://248299]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others wandering the Monastery: (4)
As of 2024-04-19 05:48 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found