Beefy Boxes and Bandwidth Generously Provided by pair Networks
Welcome to the Monastery
 
PerlMonks  

Using 'each' to get random (key, value) pair from hash returns "uninitialized value"

by puterboy (Scribe)
on Jan 28, 2011 at 19:28 UTC ( #884903=perlquestion: print w/ replies, xml ) Need Help??
puterboy has asked for the wisdom of the Perl Monks concerning the following question:

I have a large hash that I am using to cache an even larger number of data items.

When I need to add a new cache item, and the cache is full, I use 'each %hashCache' to select a "random" (key, value) pair which I then delete using 'delete($hashCache{$key})'

However, every once in a while, I get an error message:

 Use of uninitialized value $key in delete...

even though there are clearly still keys in the hash. If however, after every delete, I do something like '$mycount=keys(%hashCache)' then the error doesn't occur.

Thus it seems like unless I do something to tell perl to recalculate the keys, 'each' ends up sometimes giving up the same key that I had previously deleted. Is this a bug, limitation, or feature in the implementation of 'each' and/or how hash memory is cleaned up after 'delete'? (or am I truly missing something obvious).

Comment on Using 'each' to get random (key, value) pair from hash returns "uninitialized value"
Download Code
Re: Using 'each' to get random (key, value) pair from hash returns "uninitialized value"
by toolic (Chancellor) on Jan 28, 2011 at 19:35 UTC
    The documentation for each describes this behavior.
Re: Using 'each' to get random (key, value) pair from hash returns "uninitialized value"
by jethro (Monsignor) on Jan 28, 2011 at 19:47 UTC

    toolic meant this sentence by the way:

    After each has returned all entries from the hash or array, the next call to each returns the empty list in list context and undef in scalar context. The next call following that one restarts iteration.

    Think of a hash as a bus with lots of passenger seats and 'each' as a ticket controller who just walks from front to back and then calls "undef" to the driver when he reaches the back. If he evicted every passenger on the seats he came by and no passengers entered the bus or changed seats while he did that, the bus would be empty afterwards. But if lots of passengers got in and passengers in the bus changed seats while he did that he would call "undef" and the bus would still be filled with lots of passengers.

      Thanks. I had actually read that sentence but had 'misunderstood' it - thinking of it only in the foreach loop context and not realizing the same thing applies in a non-loop. By the way, your analogy was really helpful.

      If I am now understanding the sentence correctly, I can do something like the following:

       ($key, $value) = each %hashCache || each %hashCache;

      Presumably this would 'eat' the 'undef' token and make the lazy-a** conductor go back to the front of the bus and start collecting tickets again :)

      One more follow-up

      If I only remove elements from the cache using 'each' followed by 'delete' and if I do a delete before each add, it would seem that your analogy if taken literally would give a nice (for my purposes) circular array type behavior. I.e., once I have filled the cache, so that adds follow deletes, then each new add fills the seat left by the previous delete and the conductor won't get around to the new add until he goes around again.

      Does this indeed describe the behavior?

      Note: for my purposes this would be wonderful - combining the benefits of a circular array with that of a hash. Specifically, then 'each' would return the oldest entry (and hence likely most stale cache entry) for deletion, while the hash allows for random access of elements.

        To the question in your previous post I would say that should work. Nice line by the way. UPDATE: Seems it won't work after all, see johngg's experiment

        But the delete bevor add idea won't work, because hashed items get stored into specific slots depending on their hash value. I.e. new passengers always want to go to a seat that depends on their social security number, not to the seat that was vacated last.

        What happens when a passengers wants a seat that is occupied, There are two mayor strategies to handle this (that I remember), one is to let more than one passenger sit on a seat, the other is to calculate another seat in a deterministic pattern until the passenger finds a free one (for example the next seat, or the 13th seat from this one). I believe perl hashes use the first method (more than one on a seat) but I'm not sure.

        I was interested so I performed a bit of an experiment. If I understand correctly, this is similar to the setup you are aiming for:

        #!/usr/bin/perl use strict; use warnings; use 5.010; my @az = 'a'..'z'; my %cache = ( a => 1, b => 1, c => 1 ); for (1..20) { my $key = each %cache || ( say "back to front!" and each %cache ); say $key; delete $cache{$key}; $key = $az[rand @az] . $key; $cache{$key} = 1; }

        Sample Result:

        c a b sa qsa zc back to front! zqsa zzc wb izzc ezqsa cizzc nwb nnwb eezqsa back to front! icizzc bnnwb neezqsa back to front! sneezqsa jicizzc

        So, you don't get perfect "oldest first" behavior. In particular, it is fully possible that you evict the most recently cached item. In the bus analogy, if any new cache items sit in a seat behind where the conductor is, they may be evicted prematurely. However, some amount of "oldest first" does seem to exist and from what you've said, this is probably "good enough" for your needs.

        I do wonder whether adding items might eventually/occasionally cause perl to recompute the hash which might reset the conductor or cause an "old" key to be moved to a seat in front of the conductor's location so that the old key gets missed in a pass.

        Good Day,
            Dean

Re: Using 'each' to get random (key, value) pair from hash returns "uninitialized value"
by ikegami (Pope) on Jan 28, 2011 at 20:53 UTC

    each visits each key in turn. There may still be keys in the hash, but there are no keys left to visit.

    Before calling each(%hash) in scalar context, one needs to call keys(%hash) to reset the iterator.

    Important: each does not return a random key. No assurances are made that there is no bias. "Random numbers are too important to leave to chance", but that's what you're doing.

      Yes - I realize it is anything but random, hence the quotes in my original post. For my purposes, it needn't be random and the idea of a conductor walking back and forth collecting tickets and eventually hitting all the seats (as long as I eat the 'undef' at the end and let him start again) works good enough for me. Actually, it probably works better since it gives some approximate sens of FIFO

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://884903]
Approved by toolic
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others scrutinizing the Monastery: (6)
As of 2014-09-16 20:51 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    How do you remember the number of days in each month?











    Results (48 votes), past polls