Beefy Boxes and Bandwidth Generously Provided by pair Networks
No such thing as a small change
 
PerlMonks  

Re: can I change hash keys or values directly

by Marshall (Canon)
on Feb 01, 2021 at 00:33 UTC ( #11127723=note: print w/replies, xml ) Need Help??


in reply to can I change hash keys or values directly

Hi misterperl,
I don't think that my previous "how to do it" post in this thread didn't adequately answer adequately answered your question.
You wrote: "I guess what I'm really asking is, is there a perlVar or function that is the TRUE array of hash key and one of values that,when changed, changes the keys or values?"
The answer is "no". There is no TRUE array of hash keys.

I will attempt to give a "short" answer to your "why not?" question. There may be some slight technical inaccuracies in pursuit of brevity. Also I will need some "C lingo" to explain what a Perl hash table actually "is", "under the covers".

The first step in the hashing process is to calculate what is called the "binary hash value". This is a single unsigned binary number. Each character in the "ASCII hash key" is fed into essentially an equation and as I said, the end result of this calculation is a single unsigned binary number. This calculation is designed to be very fast (and it is). This is not a simple "checksum" - it is more analogous to a CRC calculation.

A Perl hash starts out with 8 of what are called "buckets". Thus, the bucket array starts out at length 8. Each element of this bucket array is either not used (NULL pointer) or is a C pointer to a linked list of C structures. The C structure will contain things like the actual full ASCII key name that you see as the user of the hash and the the user accessible value of that key. I don't remember if it contains the originally calculated "binary hash value" for each particular ASCII key string or not.

Ok, to use this newly created hash with 8 "buckets", Perl uses only the least significant 3 bits of the "binary hash value" (000-111 binary). This is used as an array index into the bucket array. So when you have your write something like $hash{'user'}='XXZZY";, Perl calculates the "binary hash value" for "user", that value is "masked" so that only the least significant 3 bits are used. Perl looks at the "C linked list" associated with that "bucket". If there is no list there already, a new linked list is started with this new value. What happens if there is already a linked list at that array index? Perl has to look at each element of this C linked list to make sure that it is not modifying something that is already there. If an element with hash key, "user" is not there already, it is added to the end of the linked list.

If the hash structure stayed at only 8 "hash buckets" that wouldn't be very useful for a hash table with say 1,000 entries! So Perl has an algorithm to decide when to double the size of the hash. When the hash size doubles, Perl creates a new "bucket array" of say 16 values and then uses the least significant 4 bits of the "binary hash value" as an index into that new C array.

Every element of the original 8 bucket hash has to be re-visited to see what new bucket it falls into now! Once an additional bit of the "binary hash value" is exposed and used, that changes things!

If you are "with me" so far, making a hash key "a shorter ASCII string" will not necessarily decrease the size of the internal hash. It may even cause the size of the hash to double! This new "shorter key" has to be moved to the "right bucket". If that potentially different bucket is "over used", Perl may double the size of the hash to take care of that.

In modern Perl versions, the calculation of the "binary hash value" has a "per run" random fudge factor. This was done to prevent some security issues with having a repeatable "binary hash value". I don't understand all of the security issues, but this is what it is.

The "usage of a Perl hash" is accessible by the scalar value of the hash. This is a string, like "3/8" or "2/8". This means that either 3 or 2 buckets are used out of 8 possible buckets. I have updated my "how to do it" post with code that shows that. I get randomly either 3 or 2 on multiple succesive runs of the same code. This is as expected.

It is possible to "pre-size" a Perl hash, by my %hash=1000; That will round the number of "buckets" up to 1024 ( binary power of 2 number). This prevents overhead associated with the hash doubling, 8,16,32...1024. However, in my testing, this doesn't matter at all provided that you are doing something significant with the hash once it is created. I was working with hash sizes of 100K entries and above. Perl is very efficient in how it manages its internal hash tables. Just use the features of the language!

Note/Update 1: Doubling the hash size is actually not a significant memory use. This only doubles the array of pointers to C's linked list of structures. But in order to do this, Perl has to revisit each element in the hash and decide where it needs to go next. However as I mentioned before, Perl is very, very efficient about how it does a "doubling of the internal hash size".

Again, there is no "TRUE list" of hash keys. To generate keys %hash, Perl has to traverse the entire internal hash table and return a list of the results to you. But again, Perl does this very, very quickly.

I post this a separate response instead of an update to my original response because the content and subject matter is completely different.

Replies are listed 'Best First'.
Re^2: can I change hash keys or values directly (UPDATED)
by LanX (Sage) on Feb 04, 2021 at 18:05 UTC
    > Again, there is no "TRUE list" of hash keys. To generate keys %hash, Perl has to traverse the entire internal hash table and return a list of the results to you. But again, Perl does this very, very quickly.

    Sorry, but from my understanding this is incorrect.

    There is a Linked List with keys and values in a fixed randomized order which is used by iterators like each , as well as keys and values

    There is also a C-Array of "Buckets" for quick look up via hash-function, and these "Buckets" also point to a Linked List with all entries having the same hash-value aka "Collisions".

    But this array is not traversed to generate the output for keys

    UPDATE

    Sorry ... looks like I had an incorrect or outdated source

    according to https://www.cpan.org/authors/id/G/GA/GAAS/illguts-0.09.pdf

    * RITER, EITER: The first two fields are used to implement a single iterator over the + elements in the hash. RITER which is an integer index into the array referenced by ARRAY an +d EITER which is a pointer to an HE. In order find the next hash element one would first + look at EITER->next and if it turns out to be NULL, RITER is incremented until ARRAY[RITE +R] is non-NULL. The iterator starts out with RITER = -1 and EITER = NULL.

    and I'm not sure if this is still up to date, since new security requirements led to more randomization

    Cheers Rolf
    (addicted to the Perl Programming Language :)
    Wikisyntax for the Monastery

      Hi Rolf!

      I wrote: I will attempt to give a "short" answer to your "why not?" question. There may be some slight technical inaccuracies in pursuit of brevity. Also I will need some "C lingo" to explain what a Perl hash table actually "is", "under the covers". Underlining added for emphasis.

      I am actually surprised that there weren't more: "hey, you got detail X wrong" posts!

      My goal was to explain the basic idea and then show things like the scalar value of %hash which the user can access (added to my post using actual Perl code). Perl is very much a "live language" and updates are on-going. Its been maybe 5+ years since I looked seriously "at the guts".

      Update:
      I see the posts with various links to "Perl Guts". Whether the OP needs that level of detail is up to him. I tried to answer this basic question: is there a perlVar or function that is the TRUE array of hash key and one of values that,when changed, changes the keys or values?

      Perl is like an onion. To use an onion, first you have to take the thin skin off. Then you start slicing the onion. There are many layers. That is fundamental to what an onion is! So, how come some onion slices have a green thing in the middle? That is a level of detail that I didn't address and probably the OP doesn't need to know about in order to use an onion effectively. That is the best analogy that I can come up with at the moment. I hope that my post can be understood in time measured in minutes. To understand exactly everything about how Perl makes hashes, requires time measured in days, not minutes.

        > when changed, changes the keys ...

        Well I think this is the easiest to answer, the basic mechanism is a hash-function mapping a key to an array-index for a lookup mechanism.

        And this means a stored key must be immutable , because that calculated array-index would change too.

        Consequently there can't be a faster way than replacing the whole key-value pair with "delete OLD" + "store NEW".

        All the other implementation details we discussed are about various other features and performance.

        Cheers Rolf
        (addicted to the Perl Programming Language :)
        Wikisyntax for the Monastery

        ) That's why Perl's associative arrays are called "hashes" in the first place.

Re^2: can I change hash keys or values directly
by misterperl (Monk) on Feb 04, 2021 at 16:34 UTC
    TYVM!

    I pretty familiar with hash internals from a user viewpoint but your details were most instructive. I read it twice but may need to read a few more times to let it sink in..

    Curiously, I recall my FIRST programming assignment, way back in 1977, in APL, was to write a hash table manager, which stored and retrieved hashed elements. Thinking back to how that worked- I get what you're saying. Its not stored as an array (even though keys %h returns one).. I also recall writing a non-hashed storage manager and for reasonably-sized tables, the speeds were of course much better. I get that :)

    Much appreciated you are the guru of Perl Hashes!

      I'm glad that you liked my post!

      I actually used the Perl hash algorithm in a C project once. So I got pretty far into how Perl did it.

      I wrote a .h file, memtracker.h for a local college. Students just included this .h file in their C or C++ program and magic happened. Without any source mods or link dependencies, this thing tracked C or C++ memory allocations and deallocations and when the user program exited, it showed whether there were any memory leaks and a table showing how all of this played out. It would say things like "on line X, that memory allocation has no corresponding deallocation", etc. Anyway I used a hash as the main data structure to keep track of things. Turned out to be a more complicated project that I had first thought - mainly due to making it work with 2 compilers each of C and C++ using the same single .h file. Final file was somewhat north of 1,500 lines of C with a lot of C pre-processor voo-doo. Anyway turned out to be a fun project. One C++ prof had his own version of this, but it was so slow, it was adding 20 minutes execution time to a large C++ lab! My version was so fast that it wasn't even user perceptible and it had more features. Using a very efficient data structure was a big part of the speed improvement.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://11127723]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others drinking their drinks and smoking their pipes about the Monastery: (5)
As of 2022-05-16 14:47 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?
    Do you prefer to work remotely?



    Results (63 votes). Check out past polls.

    Notices?