Beefy Boxes and Bandwidth Generously Provided by pair Networks
The stupid question is the question not asked
 
PerlMonks  

Re^3: Re-orderable keyed access structure?

by Aristotle (Chancellor)
on Aug 15, 2004 at 01:21 UTC ( #383032=note: print w/replies, xml ) Need Help??


in reply to Re^2: Re-orderable keyed access structure?
in thread Re-orderable keyed access structure?

So on the outside, the access is by key only? Then the order doesn't matter at all. It's just your way of looking at the weighting of your items, but that's all your task calls for — not an order, but a way to weight elements. Your single constraint is that there needs to be a fast way to locate the item with the lowest (or highest, depending which way around you want to define it) weight.

Well, that is indeed what a heap is good for. A heap is a complete binary tree with a simple condition: the parent's value is always greater than either of its childrens' values. That means the greatest value of the tree is always at its root. Leaves only exist at the very bottom of the tree. To insert an element, you attach it to a free slot of any of the leaves. In that case, or if you adjust a child's value such that it violates this requirement, the violation is trivial to fix: you swap it with its parent, which may recursively cause a new violation with regard to its new parent. Since the structure is always a complete binary tree and you have to do this for any operation (either up the tree or down the tree depending on whether you are inserting or removing elements), all operations happen in O( log n ).

This is obviously faster than your linear-time splices.

And there is more excellent news for you: since a heap is a complete binary tree, it's easy to map to an array. Just define the children of the element at position i to be the elements at 2i and 2i + 1. Obviously, the parent of an element is int( i / 2 ). That's for a 1-based index, for 0-based index, you need some additions and subtractions to twiddle the numbers appropriately. The tree root is then at 1, its children are at 2 and 3, their respective children are at 4, 5 and 6, 7, the next level of siblings is at 8 - 15, etc.

In your case, I'd store your actual data in a hash as one would normally do. I'd use an array-as-heap to store hash keys, and another hash to be able to map keys to heap locations. That mapping is bidirectional, so you don't need any linear scans. If you need to remember explicit weights, it's cheapest to do that using another $key => $weight hash — that way you avoid having zillions of tiny anonymous data structures, and keeping two or three hashes with identical key sets in synch is a no-brainer anyway.

If that was too short and you haven't read Sedgewick's "Algorithms", get it. Actually, if you haven't read it, get it, regardless of whether the above was sufficient. It's a classic and a staple of CS for good reason.

Makeshifts last the longest.

  • Comment on Re^3: Re-orderable keyed access structure?

Replies are listed 'Best First'.
Re^4: Re-orderable keyed access structure?
by BrowserUk (Pope) on Aug 15, 2004 at 02:51 UTC

    I appreciate your having stuck with me through this. Please understand that this is a gut reaction after first reading of your post (and having previously looked at a CPAN Heap implementation), rather than a fully thought through one. I will pursue this further.

    Using a weighted heap, moving an item to the top of the heap requires either:

    1. Giving that item a weight higher than every other item in the heap.

      Which could be done by:

      1. Giving the item being promoted the highest possible weight. No good if you need to do it twice.
      2. Iterating the heap to find the highest weight and adding one. One recursive operation that will then trigger another. (In Perl not C)
      3. Remembering the highest value yet allocated and incrementing it. (Possible, but it still triggers a recursive operation at the perl level.)
    2. Or: Adjusting the weights of every item between the item being promoted to the top, and the current top item.
    3. Then there is the question of how you raise an items weight by 1 or 2 places?
    4. And the complexity (and performance) of recursive reference chasing in Perl code.

    Contrast this with the simplicity of the splice operation.

    Then contrast it with the efficiency of doing the linear operation in C, to recursively dereferencing linked lists implemented using perl hashes or arrays.

    O(N) -v- O(log N) doesn't tell the complete story when the the former is performed in C code and the latter in Perl code. Especially, heavily structured OO Perl code.

    Then take a look at the complexity (and the weight) of something like Heap::Priority.

    The order does matter, because it is my cheap mechanism for retaining and propagating the weightings of the elements. One operation takes care of adjusting all the weights of all the affected items in the stack, because their weights are their positions.

    For now, my "string list" mechanism looks like it will be easy to implement:

    <pre> $order = "ABCDE"; %valByKey { A => \A_value } { B => \B_value } { C => \C_value } { D => \D_value } { E => \E_value } </pre>

    Accessing C via key calls for C to become top item getByKey( 'C' ); does:

    sub getByKey { my $key = shift; ## to top. $order =~ s[(^.*)($key)][$2$1]; ## Needs quotemetaing and adjusting for 3-char keys ## may also require C<no warnings 'undefined'> ## to prevent that when the item being moved is at ## the beginning or the end of the string. ## to bottom ## $order =~ s[($key)(.*$)][$2$1]; ## up one ## $order =~ s[(.?)($key)][$2$1]; ## down one ## $order =~ s[($key)(.?)][$2$1]; ## up two ## $order =~ s[(..)?($key)][$2$1]; ## down two ## $order =~ s[($Key)(..)?][$1$2]; return $valByKey{ $key };; }

    Afterwards:

    $order = "CABDE"; %valByKey { A => \A_value } { B => \B_value } { C => \C_value } { D => \D_value } { E => \E_value }

    This satisfies all my criteria, it's lightweight, pretty damn fast and easy to program.

    It's only limitation (that I can see?) is the fixed length keys, but generalising that can be an exercise for another time.


    Examine what is said, not who speaks.
    "Efficiency is intelligent laziness." -David Dunham
    "Think for yourself!" - Abigail
    "Memory, processor, disk in that order on the hardware side. Algorithm, algorithm, algorithm on the code side." - tachyon
      1. Iterating the heap to find the highest weight and adding one. One recursive operation that will then trigger another. (In Perl not C)

      See, you didn't actually understand. Where is the element with the greatest value in a heap? Always at the root. :-)

      You also didn't pay attention to my comment that a complete binary tree can be represented trivially using an array. You don't need linked lists, just index math.

      Makeshifts last the longest.

        Sorry, but I'm having trouble picturing how this works? A binary tree in an array. Highest value is always root (element 0?). How do I locate the second highest item? If it's always element 1, then you talking about a sorted array, that resorts itself as you add items to it or change the weight any of the items.

        The array elements contain the weights, where does the payload go? So the array becomes an AoA? Raising an items weight to the top means assigning that item a weight higher than the weight in $array[0][0]--easy enough.

        But then, the process of re-sorting the list is to move the newly highest weighted item to position 0. This is done by comparing this items weight with that of the item above it and if its higher (which it always will be), swap the two items and repeat with the next highest item. Keep repeating until it has found it's way to the top.

        This is a bubble sort. If the previously lowest item in the heap/array gets modified, then it's weight gets set to the highest value +1 (eg. $array[ 0 ][ 0 ] ). The heap algorithm then swaps it with every item in the heap/array, one at a time, until it reaches the top.

        Given that I already know that I am moving the item to the top of the list, the splice operation is vastly more efficient. Once I am going to do that, there is no point in embedding the weight within each element, because it is always, directly related to the elements position in the array. Ie. The elements position is it's weight.

        Back to square one.

        If I have this wrong (and I am pretty sure that I don't) then please illustrate a 5 element weighted heap and the steps required to raise the middle element to top (or bottom).


        Examine what is said, not who speaks.
        "Efficiency is intelligent laziness." -David Dunham
        "Think for yourself!" - Abigail
        "Memory, processor, disk in that order on the hardware side. Algorithm, algorithm, algorithm on the code side." - tachyon

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://383032]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others scrutinizing the Monastery: (4)
As of 2020-02-24 03:37 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?
    What numbers are you going to focus on primarily in 2020?










    Results (104 votes). Check out past polls.

    Notices?