http://www.perlmonks.org?node_id=109073

perchance has asked for the wisdom of the Perl Monks concerning the following question:

I have come into a situation where I am holding a hash of arrays, with a set (and small) number of elements in the list. The key to the hash is an integer of a reasonable size, and what I was wondering was, would it rather be cheaper memory-wise to hold the information as: I know there might not be absolute answers, but it seems to me the internal implementation must have some clear tendencies, and I am pressed for memory...

thanks in advans,
perchance

--- Ignoreland

Replies are listed 'Best First'.
Re: Array vs. Hash (newbie perlguts)
by tachyon (Chancellor) on Aug 30, 2001 at 18:42 UTC

    As I understand it a hash is more expensive than an array. Two hashes are more expensive than one hash due to the cost of setting up the buckets. In your case you want to waste some elemets in the array so the answer is GOK. The solution is suck it and see. Run this sort of code and look at real time memory useage. Make either a hash or an array an look at the memory hit.

    $size = 20000; # make a hash for ($key = 0 ; $key < $size ; $key += 2 ) { $hash{$key} = [ ( 0..100 ) ]; } # make an array, wasting 50% of the spaces #for ($key = 0 ; $key < $size ; $key += 2 ) { # $ary[$key] = [ ( 0..100 ) ]; #} print "Made big data structure, sleeping 20 seconds!\n"; sleep 20;

    cheers

    tachyon

    s&&rsenoyhcatreve&&&s&n.+t&"$'$`$\"$\&"&ee&&y&srve&&d&&print

      I think the data structure with regard to memory allocation depends more on the sparcity of the data. ie an array will always have memory allocated to the upper bound, a hash will only have memory allocated when a cell is referenced (peeked or poked). I'm thinking 'complex data structures' when i say this (hoh, aoa, hoa et al..)

      am i correct on this or talking thru' my horse?

Re: Array vs. Hash (newbie perlguts)
by dragonchild (Archbishop) on Aug 30, 2001 at 18:24 UTC
    perchance - this sounds like a case of premature optimzation to me. While, yes, more knowledge is always a good thing, this isn't something that you really need to worry about, unless you running short of RAM on your system. If that's happening, then you need to either buy more RAM or pick another implementation. *shrugs*

    The best answer is to use the implementation that best fits your mental model. Don't change your model to fit what you think is optimal for the computer. Change the computer's model to fit what is optimal for you.

    The way to think about this is that the computer's cycles are cheap and yours are expensive. The computer's cycles will never make a mistake, but yours will. The computer will never get bored, you will.

    Unless you enjoy the mental gymnastics (and some of us do), then just be lazy, get the job done, and get on with your life. :)

    ------
    We are the carpenters and bricklayers of the Information Age.

    Vote paco for President!

Re: Array vs. Hash (newbie perlguts)
by runrig (Abbot) on Aug 31, 2001 at 02:45 UTC
    The key to the hash is an integer of a reasonable size...

    All depends on what you mean by 'reasonable', how many unique keys you have, how much memory you have, and how often you'll be iterating through non-existent array elements (with the array of arrays approach - well, that won't matter memory-wise, but speed wise...), and whether you really need to worry about it. If you're only using half the elements, then an array of arrays will still win memory-wise. Hashes will take 5-10 times (very rough estimate) more space with an equal # of keys.

      While the OP did say optimize memory, lookup times for a hash are O(1) and for an array O(N).
      The choice is really 1 of fast fetch versu slows fetch,slow lookup vs fast lookup,small footprint verus large.

      Yves

      Update
      That should read O(f(N)) where f(n) changes depending on additional factors about the data in the array, such as if its sorted etc.
      Sorry.

        lookup times for a hash are O(1) and for an array O(N).

        That's ridiculous. If values are going to be indexed by integers anyway, then if anything, it takes a bit longer to look up a key in a hash than looking it up as an array index. Maybe you're thinking of looking up things in a linked list, ala LISP? Or if you're strictly searching for a value without a key then they're both O(n).

        Update: Re: your reply: I figured that was probably what you were talking about, but the original poster seemed to want to index things by integers, and was just wondering whether to use a hash or an regular array, and the issues with doing one or the other. And I've actually seen code like the below in perl programs which made me want to whack the programmer over the head with a hash array :)

Re: Array vs. Hash (newbie perlguts)
by Ryszard (Priest) on Aug 31, 2001 at 02:20 UTC
    I'm thinking the solution relies on the sparcity. if all your array elements are populated, (ie your data is dense) i think the economies of scale in fine tuning your model will outweigh any benifits you will get in memory saving.

    if you know all the indicies before hand i wouldnt go for a hash of arrays as why complicate things? $array[0] is easier to read than $hash{1}[0]

    i'm guessing by saying you've got 1..m dimensions in your data (represented by the hash element), why not split it out into 3 or 4 arrays, and only populate the array if there is a value to populate it. this way you can make your data denser while conserving space not really required by a hash. of course this makes it more difficult to parse the data around as you have to parse four structures (or references to), rather than a single reference. Without more specifics of the project and data model, its a little too difficult to make any specific suggestions.

    As always it looks like a trade off.. this time around its resources againts complexity.