Beefy Boxes and Bandwidth Generously Provided by pair Networks
"be consistent"
 
PerlMonks  

Re^3: Heap sorting in perl

by adrianh (Chancellor)
on Apr 05, 2003 at 15:40 UTC ( [id://248301]=note: print w/replies, xml ) Need Help??


in reply to Re: Re: Heap sorting in perl
in thread Heap sorting in perl

Erm. No :-)

You have to add all the items from the dataset to the heap before you can remove the N lowest (or highest, depending on the direction your grow your tree).

Yes, you could have the heap as an out-of-memory structure. However, if the dataset is on disk you can just read it in element by element and use the algorithm I proposed. It's still going to be less expensive in time and space than creating a heap.

Unless you are goin to be adding and removing entries from the data set and need to keep it ordered a heap is overkill for the problem as stated.

Replies are listed 'Best First'.
Re: Re^3: Heap sorting in perl
by Anonymous Monk on Apr 05, 2003 at 15:51 UTC
    I don't follow your reasoning.

    If you know when you start how many you will want in the end, what would possibly prevent you from adding and removing from your heap before you had all of the data in? Sure, the first "maximum" removed and thrown away might not be the biggest element in the whole set. But it wasn't in the smallest N, and that is all we care about. Who cares what order we throw away the rest in?? (As long as we throw it away before running out of memory!)

    As for overkill, using Perl to solve a problem that can be solved with the correct option to the Unix sort utility is overkill. But if you need a general purpose solution in Perl and you have a heap implementation there already, why not use it?

      I don't follow your reasoning.

      We were talking at cross purposes. I thought that you were proposing to heap sort the entire dataset, not the subset of smallest elements.

      I admit that my using sort to order the smallest subset is a hack, but unless N is large or the comparison function is expensive, it will be faster than a pure-perl heap.

      As for overkill, using Perl to solve a problem that can be solved with the correct option to the Unix sort utility is overkill. But if you need a general purpose solution in Perl and you have a heap implementation there already, why not use it?

      I don't see how the Unix sort applies. We don't want to sort the whole dataset - we want the smallest N elements (and we also don't know if the dataset consists of line oriented data.)

        I was proposing to solve the problem originally stated. I thought that my description of how to solve it using a heap was fairly clear on that point by telling you when to start removing elements while still adding them.

        I agree that until N is large (causing, eg, memory problems) or the comparison function is expensive, using sort will be faster. The heap solution outlined is helpful only for large N. For an expensive comparison function, it is helpful once you start pulling off elements to keep track of the largest that you have in your set. Anything which is larger than or equal to that one should not be added. This reduces the number of comparisons significantly.

        As for the sort utility, I had thought that it had an option to only return the largest N elements. But looking at it more carefully I don't see it. It must have been somewhere else that I saw that. My bad. :-(

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://248301]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others making s'mores by the fire in the courtyard of the Monastery: (7)
As of 2024-04-23 15:05 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found