Beefy Boxes and Bandwidth Generously Provided by pair Networks Cowboy Neal with Hat
Do you know where your variables are?
 
PerlMonks  

Efficient random hash stuff

by japhy (Canon)
on Nov 17, 2000 at 00:57 UTC ( #42074=perlmeditation: print w/ replies, xml ) Need Help??

I'm trying to get around the possible slowness of:
# ($k,$v) = rhe_1(%hash); sub rhe_1 (\%) { my $key = (keys %{ $_[0] })[rand keys %{ $_[0] }]; return ($key, delete $_[0]{$key}); }
Now, this is obviously inefficient, because it must rebuild a list of keys each time, which can be bad for large hashes (like a hash representing a dictionary). Now, one possible solution is an array and a hash working together:
# @keys = keys %hash; # done ONCE # ($k,$v) = rhe_2(%hash,@keys); sub rhe_2 (\%\@) { my $key; do { $key = rand @keys } until exists $_[0]{$_[1][$key]}; return ($_[0]{$_[1][$key]}, delete $_[0]{$_[1][$key]}); }
That trades the slowness of splice() for the possible slowness of having to reselect a random index. However, this requires an extra N amount of memory allocated.

What can I do? Is there any work on making such an operation efficient?

Here's a possible solution, which needs extra space, but not much (on average), and uses an array instead of a hash:
# %seen = (); # done ONCE # @array = ([k,v], [k,v], ...); # ($k,$v) = rhe_3(@array,%seen); sub rhe_3 (\@\%) { my $idx; do { $idx = rand @{ $_[0] } } while $_[1]{int $idx}++; return @{ $_[0][$idx] }; }
I'd be glad to hear any ideas.

$monks{japhy}++ while $posting;

Comment on Efficient random hash stuff
Select or Download Code
(tye)Re: Efficient random hash stuff
by tye (Cardinal) on Nov 17, 2000 at 01:18 UTC

    Ah, no need to splice.

    # @keys = keys %hash; # done ONCE # ($k,$v) = rhe(%hash,@keys); sub rhe (\%\@) { my( $hv, $av )= @_; my $idx= rand @$av; my $key= $av->[$idx]; my $val= delete $hv->{$key}; my $last= pop @$av; $av->[$idx]= $last if $idx < @$av; return( $key, $val ); }

    As for saving space, I don't see you really making use of the hash here so you could do:

    @keys = keys %hash; @vals = values %hash; undef %hash;
    If the keys are long, then you could also probably save a bit of space by storing reference to the keys in the array instead of the key values themself (or maybe not).

            - tye (but my friends call me "Tye")
      Wow, that's damn nice. And we could use my just-an-array idea from above:
      sub rand_element (\@); @array = ([1,2], [3,4], [5,6]); # etc... ($k,$v) = rand_element(@array); sub rand_element (\@) { my $aref = shift; my $idx = rand @$aref; my ($k,$v) = @{ $aref->[$idx] }; my $last = pop @$aref; $aref->[$idx] = $last if $idx < @$aref; return ($k,$v); }
      Thank you muchly, tye. I need to ++ YOU, not your node...

      $monks{japhy}++ while $posting;

        Note that ([1,2],[3,4],[5,6],...) consumes quite a bit more memory (no, I haven't tested this) than ( [1,3,5,...], [2,4,6,...] ). I agree that this is less elegant, but if memory usage is a primary concern, then the elegance loss might be worth it.

                - tye (but my friends call me "Tye")
Re: Efficient random hash stuff
by Dominus (Parson) on Nov 17, 2000 at 01:24 UTC
    You should check out Dan Schmidt's article Building a Better Hash. He is trying to solve exactly the same problem. He comes up with a Perl module that implements a data structure with fast indexed access for search, insert, and delete (like a hash) but also fast random key selection.

      I've seen his article in TPJ -- I don't have it on me, and I'm looking for a random-access-only data structure. tye helped come up with it. It's swimmingly cool.

      $monks{japhy}++ while $posting;
Re: Efficient random hash stuff
by Fastolfe (Vicar) on Nov 17, 2000 at 01:25 UTC
    My first instinct is to use the %hash/@array duality property and to select a key randomly by finding a modulus 2 index out of that array, but I don't know of any way of de-referencing a hashref in such a way that I can use that property. It'd be easy to do something like that if we could turn it into a real hash first, but that makes it less efficient than it is now.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlmeditation [id://42074]
Approved by root
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others chanting in the Monastery: (10)
As of 2013-05-21 02:34 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    The best material for plates (tableware) is:









    Results (427 votes), past polls