Beefy Boxes and Bandwidth Generously Provided by pair Networks
Problems? Is your data what you think it is?
 
PerlMonks  

using values and keys functions

by jim_neophyte (Sexton)
on Jun 08, 2005 at 20:07 UTC ( #464822=perlquestion: print w/ replies, xml ) Need Help??
jim_neophyte has asked for the wisdom of the Perl Monks concerning the following question:

i am reading in the "Perl Cookbook" at the bottom of page 449. i am confused about the use of the values and keys functions with respect to the order in which those results are returned.

i thought the ordering of keys/values are or will be random, i.e. the values function gathers the values and returns them in random order; then the keys function gathers the keys and returns them in random order.

my understanding is that the order of keys and values being returned is affected by insertion order. i also thought that soon if not already the order returned is further randomized for some sort of security thing.

if i understand the following code, the keys function and values function will have to operate on the hash at the same time. will the following really work?

my %extra = @_; @${self{keys %extra} = values %extra;
thanks.

Comment on using values and keys functions
Download Code
Re: using values and keys functions
by tlm (Prior) on Jun 08, 2005 at 20:12 UTC

    From the docs for values (my emphasis):

    The values are returned in an apparently random order. The actual random order is subject to change in future versions of perl, but it is guaranteed to be the same order as either the "keys" or "each" function would produce on the same (unmodified) hash.

    the lowliest monk

Re: using values and keys functions
by Roy Johnson (Monsignor) on Jun 08, 2005 at 20:15 UTC
    keys, values and each will all return values in bucket order. That order is not predictable by you, but it is not truly random. Your example won't work because it's a syntax error (and if the syntax error were fixed, it would be trying to assign things to where they already are). But yes, keys and values will return corresponding lists.

    Caution: Contents may have been coded under pressure.
Re: using values and keys functions
by shemp (Deacon) on Jun 08, 2005 at 22:50 UTC
    I think i understand what you're asking regarding the apparent same time operations of keys and values in your line of code.
    What actually happens with keys or values is that a list of all of the proper info (keys or values) is created all at once. Keys and values are not really iterators. They can appear to be, but the list is created all at once.

    On the other hand, each is a true iterator, and uses the hashes internal position iterator (cant think of the real name) to keep track of where it was.

    Consider the following code:

    use strict; use warnings; { my %hash = ( 'a' => 1, 'b' => 2, 'c' => 3, ); foreach my $key (keys %hash) { print "$key = $hash{$key}\n"; foreach my $key (keys %hash) { print "\t$key\n"; } } }
    It outputs:
    c = 3
            c
            a
            b
    a = 1
            c
            a
            b
    b = 2
            c
            a
            b
    

    This is because keys assembles a full list, and its that list that the foreach's are iterating over.

    Now consider this code (on the same hash)

    while ( my($key, $value) = each %hash ) { print "$key = $value\n"; foreach my $key (keys %hash) { print "\t$key\n"; } }
    This results in an infinite loop because on each time through the while loop, each gives back the next key-value pair. By next I mean in the sense of checking the hash iterator. But the inner foreach loop calls keys on the same hash as the outer loop. keys and values both automatically reset the iterator, and create their return list by iterating thru the whole hash. And after getting to the end, the iterator is again reset for next time. Then, when the each is called again, the iterator says to return the first key-value pair.

    This can be seen by letting the infinite loop run a few times.

Re: using values and keys functions
by tilly (Archbishop) on Jun 09, 2005 at 01:08 UTC
    Other people have answered your main question. Here are answers to your peripheral questions.

    To understand the answers, you need to know how a hash works internally. Internally a hash has a set of buckets. There is a function (aka a hash function) that decides what bucket each key should go into. Ideally the assignment of keys to buckets will look random, so if you have enough buckets for your keys, then no bucket has very many keys. But in fact it is deterministic. That means that inserting/retrieving/deleting are always fast, because you only have to work with the handful of keys in a bucket. (Technical note, Perl changes the number of buckets if the hash gets too many keys, thereby keeping the number of keys/bucket down. This operation is known as a "hash split" and is expensive. But it is also rare, and the cost of this operation averages out to a constant per insert. Perl does not try to reclaim memory if a hash shrinks after having grown.)

    1. i thought the ordering of keys/values are or will be random No. Perl walks the buckets in order, and for each bucket walks the contents in order. Since Perl does this the same way for both keys and values, the order will match between them.
    2. my understanding is that the order of keys and values being returned is affected by insertion order. Yes. The assignment of keys to buckets is not affected by insertion order, but the order of keys in buckets can be. (OK, I lied there. It is possible in at least some versions of Perl for the order that keys are added to cause a split to have happened/not happened.)
    3. i also thought that soon if not already the order returned is further randomized for some sort of security thing. Yes. In recent versions of Perl, the hashing function that is used changes every time you run Perl. This is to prevent people from sending you carefully constructed data that causes your keys to all go into one bucket. Since they can't know what hashing function you're using, they have no way to construct a malicious dataset except by accident (and the odds against it are high).
    4. ...will the following really work? Yes. That is because Perl actually runs values first to generate a list, then keys to generate the list of variables, then proceeds to assign the one to the other.
      I've wondered about the security aspects of changing the hashing function..

      Whilst I acknowledge that possibly a security risk is posed from the order of items retrieved from a hash.. I can't actually think of any practical areas where randomising the hash function actually assists.

      Surely if someone is putting together a hash that is at risk of attack then they should filter the data somehow?

      Wouldn't a more fixed hashing function be of greater benefit.. are there any programmers today who take advantage of the order in which the hash is output consistantly across executions?

      note this is just a musing.. not an actual Perl change-request..

      update: Thanks for the replies below, very informative!

        You might find this page interesting. Look for the links from it ( Dominus' reference, and another to a post here Hash Clash on purpose by iburrell).


        Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
        Lingua non convalesco, consenesco et abolesco. -- Rule 1 has a caveat! -- Who broke the cabal?
        "Science is about questioning the status quo. Questioning authority".
        The "good enough" maybe good enough for the now, and perfection maybe unobtainable, but that should not preclude us from striving for perfection, when time, circumstance or desire allow.
        BrowserUk answered the security question.

        Note that filtering against this attack is virtually impossible, without extensive analysis you won't know what could possibly be a problem, and it could affect any hash at all that gets lots of data. Hashes are documented to be fast, and it is Perl's job to make them work out that way.

        As for people relying on the order from the hash, I'd consider breaking that to mostly be a benefit. Anyone who relied on hash order being consistent was guaranteeing that their code would break when you change versions of Perl. (Perl's hash function changed fairly frequently, though admittedly not as often as it does now.) With the new change, people catch their mistake earlier. A real example of this mistake that I believe bit Ovid was a poorly written test that assumed the order in which keys came back out from a hash.

        Though, admittedly, it did cause a few problems for people who would compare whether they got the same hash that they had previously by using Storable to stringify the hash, and then did a string compare with the old result. However you can fix that by setting $Storable::canonical to a true value.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://464822]
Approved by tlm
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others perusing the Monastery: (8)
As of 2014-12-22 07:07 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    Is guessing a good strategy for surviving in the IT business?





    Results (112 votes), past polls