A heap is a specialized datastructure that can be thought of as an "automatically sorted array" given a somewhat scaled down definition of "sorted". Sorted in this case means that the largest element is always easy to find and remove, and inserting new elements is also easy.
Given that, our steps to find the smallest M elements would be:
- Build a heap from our first M items
- Compare next item to largest element in heap
- Replace largest with new if new < largest
- Repeat steps 2 and 3 for all remaining items
As you can see, this is very similar to the strategy you proposed. In fact, you could view heaps as a datastructure designed specifically to implement this algorithm efficiently.
-Blake
| [reply] |

*In step 3, instead of going thru the entire original array, you can chop the original array into pieces, say each piece contains 10 * N element (tweak with this 10, it could be 5, could be 50...). Sort each piece (we are sorting some smaller arrays), then only take the first N elements of the soted piece, and go thru them.*
How is this an improvement (he asks curiously ;-) I don't see how creating a sorting several subsets of the data can be cheaper in time or space than a single pass through everything.
| [reply] |

adrianh is obviously right about this.
| [reply] |

Comment onRe: Heap sorting in perl