Beefy Boxes and Bandwidth Generously Provided by pair Networks
P is for Practical
 
PerlMonks  

Re^8: Re-orderable keyed access structure?

by BrowserUk (Pope)
on Aug 15, 2004 at 13:35 UTC ( #383088=note: print w/replies, xml ) Need Help??


in reply to Re^7: Re-orderable keyed access structure?
in thread Re-orderable keyed access structure?

Yes, you inspect log N item and move one, (steps 1, 2 and 3) below). But then you are not finished. You still need to swap items 1 and 2.

1) 0 [ 10 ] 2) 0 [ 10 ] 3) 0 [ 11 ] 4) 0 [ 11 ] 1 [ 9 ] 1 [ 9 ] 1 [ 9 ] 1 [ 10 ] 2 [ 8 ] 2 [ 11 ] 2 [ 10 ] 2 [ 9 ] 3 [ 7 ] 3 [ 7 ] 3 [ 7 ] 3 [ 7 ] 4 [ 5 ] 4 [ 5 ] 4 [ 5 ] 4 [ 5 ]

Now try making that a 7 item array and moving the middle item to the top. Count the number of comparisons and swaps required.

In the end, you have had to move the middle item to the top and all the intervening items down. Splice does this directly. A heap algorithm does it one at a time.

Splice does this in O(N). A heap algorithm does it using O(N log N).

I have several good data structure & algorithm books, a couple of them are almost as old as you. Unlike you apparently, I haven't just read the headlines. I've also implemented many of the algorithms myself and understood the ramifications.

I was simply waiting for you to catch up with the fact that the use of heaps has no benefit here.

The likely size of the cache is a few hundred, maybe 1000 elements. More than this and I run out of file handles or memory. splice is way more efficient at moving 1 item in an array of this size than any implementation of a (binary search + swap) * (old_position - new_position) in Perl.


Examine what is said, not who speaks.
"Efficiency is intelligent laziness." -David Dunham
"Think for yourself!" - Abigail
"Memory, processor, disk in that order on the hardware side. Algorithm, algorithm, algorithm on the code side." - tachyon

Replies are listed 'Best First'.
Re^9: Re-orderable keyed access structure?
by Aristotle (Chancellor) on Aug 15, 2004 at 19:05 UTC

    Sorry, I'm not the one who seems to only have read headlines. A heap does not somehow entail a bubble sort. But let's leave the ad hominem out and look at facts.

    Yes, you inspect log N item and move one, (steps 1, 2 and 3) below).

    A single swap requires inspecting exactly two elements, not log n. You need at most log n swaps total at any time.

    But then you are not finished. You still need to swap items 1 and 2.

    Why? The heap condition is not violated at any point after your step 3 (which is really step 2, and swapping step 1). $a[0] > $a[1] and $a[0] > $a[2] is fulfilled, so the root and its children satisfy the condition. Likewise $a[1] > $a[3] and $a[1] > $a[4], so the left child of the root and its children satisfy the condition as well. $a[2] has no children, so it automatically satisfies the condition as well. Your step 4 is not required in a heap.

    Want me to demonstrate on a larger heap? Sure.

    X) 0 [ 13 ] 0) 0 [ 13 ] 1) 0 [ 13 ] 2) 0 [ 13 ] 3) 0 * 16 ] 1 [ 12 ] 1 [ 12 ] 1 [ 12 ] 1 [ 12 ] 1 [ 12 ] 2 [ 11 ] 2 [ 11 ] 2 [ 11 ] 2 * 16 ] 2 * 13 ] 3 [ 10 ] 3 [ 10 ] 3 [ 10 ] 3 [ 10 ] 3 [ 10 ] 4 [ 9 ] 4 [ 9 ] 4 [ 9 ] 4 [ 9 ] 4 [ 9 ] 5 [ 8 ] 5 [ 8 ] 5 * 16 ] 5 * 11 ] 5 [ 11 ] 6 [ 7 ] 6 [ 7 ] 6 [ 7 ] 6 [ 7 ] 6 [ 7 ] 7 [ 6 ] 7 [ 6 ] 7 [ 6 ] 7 [ 6 ] 7 [ 6 ] 8 [ 5 ] 8 [ 5 ] 8 [ 5 ] 8 [ 5 ] 8 [ 5 ] 9 [ 4 ] 9 [ 4 ] 9 [ 4 ] 9 [ 4 ] 9 [ 4 ] 10 [ 3 ] 10 [ 3 ] 10 [ 3 ] 10 [ 3 ] 10 [ 3 ] 11 [ 2 ] 11 * 16 ] 11 * 8 ] 11 [ 8 ] 11 [ 8 ] 12 [ 1 ] 12 [ 1 ] 12 [ 1 ] 12 [ 1 ] 12 [ 1 ]

    That's it. 3 swaps among a segment of 12 elements.

    In a heap with 100 elements, you need at most 7 swaps to get an item from the bottom of the heap to the top without violating the heap condition. I am doubtful of whether splice would win.

    In a heap with 1,000 elements, you need at most 10 swaps. How much money will you bet on splice?

    Makeshifts last the longest.

      But let's leave the ad hominem out...
      Please do pick up a book or two on algorithms and data structures; this is stuff anyone who is serious about programming should know.

      Yes. Let's do that.

      In a heap with 1,000 elements, you need at most 10 swaps. How much money will you bet on splice?

      Quite a lot, were I a betting man! :)


      From where you left off. A new item not currently in cache is called for, it is read from disk, the lowest item* (currently index 12) is replaced by the new item in the array** and the new item given a weight of 17.

      a) 0 [ 16 ] b) 0 [ 16 ] c) 0 [ 16 ] d) 0 [ 16 ] e) 0 * 17 ] 1 [ 12 ] 1 [ 12 ] 1 [ 12 ] 1 * 17 ] 1 * 16 ] 2 [ 13 ] 2 [ 13 ] 2 [ 13 ] 2 [ 13 ] 2 [ 13 ] 3 [ 10 ] 3 [ 10 ] 3 * 17 ] 3 * 12 ] 3 [ 12 ] 4 [ 9 ] 4 [ 9 ] 4 [ 9 ] 4 [ 9 ] 4 [ 9 ] 5 [ 11 ] 5 [ 11 ] 5 [ 11 ] 5 [ 11 ] 5 [ 11 ] 6 [ 7 ] 6 * 17 ] 6 * 10 ] 6 [ 10 ] 6 [ 10 ] 7 [ 6 ] 7 [ 6 ] 7 [ 6 ] 7 [ 6 ] 7 [ 6 ] 8 [ 5 ] 8 [ 5 ] 8 [ 5 ] 8 [ 5 ] 8 [ 5 ] 9 [ 4 ] 9 [ 4 ] 9 [ 4 ] 9 [ 4 ] 9 [ 4 ] 10 [ 3 ] 10 [ 3 ] 10 [ 3 ] 10 [ 3 ] 10 [ 3 ] 11 [ 8 ] 11 [ 8 ] 11 [ 8 ] 11 [ 8 ] 11 [ 8 ] 12 * 17 ] 12 * 7 ] 12 [ 7 ] 12 [ 7 ] 12 [ 7 ]

      Now, another new item is called for, so I need to locate the lowest weighted item in the array. *How do I do this?


      And another problem, when I need to locate one of these items that are moving around in this heap via it's key.

      **How do I locate it?

      Actually, it's just the original one. That of maintaining the linkage between the items in the array(heap) and their keys. No matter how long I "look at the pictures"--or read the text--at heaps, I do not see the mechanism by which the lowest weighted item in the heap is located (other than a linear search).

      To re-state the requirements. I need to be able to:

      1. Locate the highest weighted item.

        This is required to allow promotion of the lastest accessed item to the top in the classic LRU algorithm.

      2. Locate the lowest weighted item.

        Also an LRU requirement(or any variation), as this is the one that will be discarded when the cache is full and a new element must be added.

      3. Locate an item in the cache via it's key.

        As the items get moved around, that linkage *must* be maintained.

        Embedding the key within the item would require a linear search to locate it. The purpose of the exercise was to avoid a linear search.


      Examine what is said, not who speaks.
      "Efficiency is intelligent laziness." -David Dunham
      "Think for yourself!" - Abigail
      "Memory, processor, disk in that order on the hardware side. Algorithm, algorithm, algorithm on the code side." - tachyon

        I was going to address this at some point, but to be honest, I've lost interest. So I decided to post a closing note instead of letting this silently slip into oblivion.

        The point I was going to make is that your heap code is not exactly efficient. The algorithm description is formulated recursively, but you don't need to implement it that way. Since it's just tail recursion, you can trivially write it iteratively, which would gain a lot of ground.

        Still, that would almost certainly only accelerate the code by a constant factor, which does not make it worth the effort for the smallish data sets you're working with.

        I confess my surprise to find Perl is that slow. I am no stranger to optimizing Perl, but I've never come across such a stellar disparity between a builtin and explicit code before.

        Makeshifts last the longest.

      I wanted to see if how heaps could be made to work for this.

      As well as O(N) -v- O(log N) for any given part of an algorithm not telling the whole story, you also have to consider the cost of all parts of the algorithm.

      This benchmarks not only the promotion, but also building the list, promoting from any given position and removing (lowest weighted) items, until empty.

      Feel free to show me where I am using a bad implementation.

      #! perl -slw use strict; use List::Util; use Benchmark qw[ cmpthese ]; our $SIZE ||= 1000; our $ITERS||= -5; sub spliceIt{ my @a; ## Add $SIZE items push @a, $_ for 0 .. $SIZE; ## Promote $SIZE items, 1 from each position. push @a, splice @a, $_, 1 for 1 .. $SIZE; ## Remove $SIZE (lowest) items. shift @a for 0 .. $SIZE; } sub heapIt { my @a; ## Add $SIZE items for( 0 .. $SIZE ) { $a[ @a ] = ( $a[ 0 ] || 0 ) + 1; moveUp( \@a, $#a ); } ## Promote $SIZE items, 1 from each position. ## !!Ass-uming I could locate the item that needs promoting!! for( 0 .. $SIZE ) { $a[ $_ ] = $a[ 0 ] + 1; moveUp( \@a, $_ ); } ## Remove $SIZE (lowest) items. for( 0 .. $SIZE ) { ## Find the lowest (linear search unless you know a better way +?) my $low = 0; for( 1 .. $#a ) { $a[ $_ ] < $a[ $low ] and $low = $_; } ## If the lowest is the last ## remove and and move on. $#a-- and next if $low == $#a; ## overwrite the lowest with the highest $a[ $low ] = $a[ 0 ]; ## Move the last to the highest $a[ 0 ] = $a[ $#a ]; ## Discard the last $#a--; ## Now move the (moved) highest item up moveUp( \@a, $low ); } } sub moveUp { my( $ref, $l ) = @_; my $p = int $l /2; return if $p >= $l; my $temp = $ref->[ $p ]; $ref->[ $p ] = $ref->[ $l ]; $ref->[ $l ] = $temp; moveUp( $ref, $p ); } print "Testing $SIZE items for $ITERS iterations"; cmpthese( $ITERS, { splice => \&spliceIt, heap => \&heapIt, }); __END__ ## After making the benchmark more realistic ## By benchmarking adding, promoting & removing (lowest) items. P:\test>heaptest -ITERS=-5 -SIZE=100 Testing 100 items for -5 iterations Rate heap splice heap 156/s -- -98% splice 8335/s 5235% -- P:\test>heaptest -ITERS=-5 -SIZE=100 Testing 100 items for -5 iterations Rate heap splice heap 157/s -- -98% splice 8330/s 5221% -- P:\test>heaptest -ITERS=-5 -SIZE=1000 Testing 1000 items for -5 iterations Rate heap splice heap 4.21/s -- -99% splice 662/s 15613% -- P:\test>heaptest -ITERS=-5 -SIZE=10000 Testing 10000 items for -5 iterations (warning: too few iterations for a reliable count) s/iter heap splice heap 17.9 -- -100% splice 5.18e-002 34393% --

      Examine what is said, not who speaks.
      "Efficiency is intelligent laziness." -David Dunham
      "Think for yourself!" - Abigail
      "Memory, processor, disk in that order on the hardware side. Algorithm, algorithm, algorithm on the code side." - tachyon

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://383088]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others making s'mores by the fire in the courtyard of the Monastery: (7)
As of 2020-04-08 16:40 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?
    The most amusing oxymoron is:
















    Results (45 votes). Check out past polls.

    Notices?