Beefy Boxes and Bandwidth Generously Provided by pair Networks
XP is just a number

Best Multidimensional Hash Practices?

by DamianKaelGreen (Acolyte)
on Oct 12, 2009 at 19:27 UTC ( #800779=perlquestion: print w/replies, xml ) Need Help??
DamianKaelGreen has asked for the wisdom of the Perl Monks concerning the following question:

Oops, I realize now that I made a fairly major mistake in the way I posed my original question a couple of days ago, and it changes the way you have to look at the problem, So I am starting it over anew. (original question: at Best Hash Practices?)

The problem really has to do with multidimensional hashes... Before, I was trying to simplify my more complex example by reducing my multidimensional testcase down to a single dimensional hash and failed to recognize that testing a single dimensional hash does not behave the same way as a multidimensional one does. So now my, question has been rephrased here:


How do you avoid having to test the existence of a $hash{key}{combination} before testing the existence of its corresponding value?


The Perlish way to test if a scalar has been set is to do something like:

if ($foo){...}
and for a single dimensional hash:
if ($myHash{"unknown_if_this_key_exists_yet_or_not"}){...}

But when we try to extend that concept to a multidimensional hash, we begin to have a problem. Ideally the perlish way of testing a value in a hash might be to do something like this:

if ($myHash{unknown}{if}{this}{key}{combination}{exists}{yet}{or}{not} +){...}

But the reality is that will not work because entries for the keys automatically get created in the hash when we just try to test if the value is true, so we end up having to do something like this instead:

if (exists($myHash{unknown}{if}{this}{key}{combination}{exists}{yet}{o +r}{not}) and $myHash{unknown}{if}{this}{key}{combination}{exists}{yet +}{or}{not}){...}

But that's a lot of typing, and typing isn't very perlish. It would really be nice if accessing a multidimensional hash in this manner just returned the value, but since that's not the case, how can we avoid having to repeat the list of keys twice?

I thought about using Hash::Util qw{lock_hash unlock_hash) to lock a hash whenever it is not being modified, but programs carp out in the following situation:

if ($myHash{"this_key_does_not_exist"){...}

Again, not very perlish.

So then I considered "use Readonly::Hash"; and then you can do:

Readonly::Hash $myHash => ("key1" = "val1",); $myTempValue = $myHash{"this_key_does_not_exist"};

and the program will not die. But that only works as long as you never have to modify the hash. When that time comes you're in trouble. There does not seem to be any way to make the hash modifiable again once it is made Readonly... Maybe perl monks know of a way?

Finally, I've concluded that the best way to handle this is to just create a separate subroutine that accepts the keys as input, checks the existence of both the key combination and associated value, and then returns the value. But my question to the monks is this: what really is the best way to handle this situation? I'm sure people encounter this problem all the time, and I can't seem to find any documentation for it anywhere. What should the best practice be?

The point is that best methods of handling multidimensional hashes need to be clarified somewhere...


Years later (2013), I'm returning to say that "no autovivification" seems to be the most elegant and perlish solution to the situation above. I have used it in all my code for the past several years and have not noticed any negative side affects. I have even been recommending to other perl programmers I know, to always include this statement at the top of their code, even if they do not know what autovivification means, or they do not expect to encounter any issues involving autovivification, because using it is the simplest way to avoid running into so many unexpected issues.

Hope that helps!

Replies are listed 'Best First'.
Re: Best Multidimensional Hash Practices?
by GrandFather (Sage) on Oct 12, 2009 at 19:48 UTC

    I read as far as your premise. It's wrong. To test if a scalar has been set use:

    if (defined $scalar) {

    To test if a hash key/value pair exists use:

    if (exists $hash{key}) {

    To test if the scalar content of a hash value has been set use:

    if (defined $hash{key}) {

    Those last two steps can be combined:

    if (exists $hash{key} && defined $hash{key}) {

    and of course you can make the same tests with a multi-dimensional hash:

    if (exists $hash{key1}{key2} && defined $hash{key1}{key2}) {

    This last example is interesting if key1 didn't exist - it pops into existence. This is a process called autovivification and happens when Perl needs to have a hash or array element in order to write to it or use it to access an element is references.

    True laziness is hard work
      if key1 didn't exist - it pops into existence. This is a process called autovivification

      I always wondered what the design decision behind this autovivification behaviour with mere existence testing had been...  I mean why does Perl not simply do a short-circuit evaluation from left to right, stopping as soon as a hash key does not exist?  In the example, key2 can't possibly exist if there is no hash referenced via key1 at all, because there is not even a key1. So why proceed any further?

        If you do not want the autovivification to happen with exists, try no autovivification 'exists';

        You can even restrict its effects within a lexical scope!


        A program should be light and agile, its subroutines connected like a string of pearls. The spirit and intent of the program should be retained throughout. There should be neither too little or too much, neither needless loops nor useless variables, neither lack of structure nor overwhelming rigidity." - The Tao of Programming, 4.1 - Geoffrey James

        Because it isn't exists that triggers autovivication. It is the (implied or explicitely written) -> operator, as was pointed out in the previous thread by DamianKaelGreen.

        Autovivication is exactly the kind of thing that makes this code work:

        use strict; use warnings; my $hashRef; # note how it is undefined at this moment! $hashRef->{cogito} = "ergo sum"; print ref $hashRef, "\n" # Output: HASH # Hey... aint that cool? # But we can go even further! $hashRef->{"somewhere"}->{"deep down"}->{"this"}->{"data structure"} = + "I think, so I exist.";

        Now, that last line isn't a true beauty but there are those cases that you need deep structures like that. And it really wouldn't be Perlish if you had to make each part of it come into existance manually.

Re: Best Multidimensional Hash Practices?
by lamprecht (Friar) on Oct 12, 2009 at 19:43 UTC

      Thanks for the reply, this is a great reference. Subroutines like these are exactly the kinds of things I was hoping to see. They solve the autovivication problem that happens before the test...

      Still I have to admit they are sort of clunky subroutines, I have to wonder how fast they are and if there is a better way of doing this yet...

      Also, I think it's very interesting that it states: "Many people think that exists and defined should fail at the first level thay can." and then goes on to say: "This issue has been argued heavily in various fora including p5p but it won't be changed as too much code works with the current behavior. It is the way Perl treats it and you can't directly get around it. Perl6 has been discussing this and may do something to support this and it could be controlled by a pragma. But there are still gray areas..."

      The sub deep_defined described in the link above works well, but in order to make it a little more flexible in the way that it is used, I have added wrapper sub for it that accepts the list of keys as a string instead. Therefore it can be cut and pasted directly from the line where the key-combination was first autovivified, without typos or text transformations.

      The following three subroutines work in conjunction to support the new sub deep_defined, or they may stand alone. The original "deep_defined" was renamed to "deep_defined_action" and now returns the defined value instead of just a boolean, because I think that's more useful. My implementation puts "deep_defined" in @EXPORT, and "deep_defined_action" in @EXPORT_OK in the package definition, but I leave the actual package implementation up to the user here... Some POD also accompanies these functions to describe basically how they should be used...

      =pod =head1 =============================================================== +============= =head1 MULTIDEMENSIONAL HASH-ARRAY FUNCTIONS =head1 --------------------------------------------------------------- +------- =head2 deep_defined ( <\%hash or $hash_ref>, <@key_list or $key_string +> ) OBJECTIVE: Return the value defined for a $mixedHashArray{$key}[$combi +nation]{$set}, without invoking autovivification. (with more flexability than deep_define +d_action.) PREMIS: Prevent "$mixedHashArray{key}[combination]" from being inst +antiated when effectively testing if "$mixedHashArray{key}[combination]{set}" is defined or +while getting it's value. This function is flexible enough to accept args in a list format, +or a format that is more condusive to copying the keys directly from the instantiation +... EX: Instantiation: (w/ autovivification) $hash{key}{combination}{set} = "some_value"; Test: if ($value = &deep_defined(\%hash, qq( {key}{combination}{set} + ) ) ){ print "$value"; #prints "some_value"... } But Notice that if you test an undefined key combination set, no a +utovivification occurs. EX: Test: if ($value = &deep_defined(\%hash, qq( {undefined}{key}{combin +ation}{set} ) ) ){ ### test fails; no autovivification } You can also specify the key combination set as a list if you find + that more practical: EX: Test: if ($value = &deep_defined(\%hash, "key", "combination", "set" +)){ ... } It also handles $hash_refs instead and arrays and mixed hash-array + combinations... EX: Instantiation: (w/ autovivification) $hash{key}{combination}{set}[2]{and_key_for_hash_in_second_lis +t_position} = "some_value"; $hash_ref = /%hash; Test: if ($value = &deep_defined($hash_ref, qq( {key}{combination}{s +et}[2]{and_key_for_hash_in_second_list_position} ) ) ){ print "$value"; #prints "some_value"... } ADDITIONAL INFO: This function is simply a wrapper for the sub deep_defined_action( +), which does the actual combinational testing... It takes the second argument and parses it as a string into a list + in wich deep_defined_action will accept. =cut sub deep_defined { my( $possible_ref, @keys ) = @_ ; if (ref($possible_ref) eq "HASH" or ref($possible_ref) eq "ARRAY") +{ ### now look at the keys to see what they are: my $first_key = @keys[0]; $first_key =~s/\s+//g; #get rid of spaces $first_key =~s/^(\{|\[)//g; #get rid of the leading bracket + or brace $first_key =~s/(\}|\])$//g; #get rid of the trailing bracke +t or brace #my @list = split(m/\}\{|\]\[|\]\{|\}\[|\{|\}|\[|\]/, $first_k +ey); ### these are kind of ordered... my @list = split(m/\}\{|\]\[|\]\{|\}\[/, $first_key); ### thes +e are kind of ordered... if (@list > 1){ my @new_list = (); foreach my $item (@list){ if (&is_int($item)){ @new_list = (@new_list, int($item)); }else{ push (@new_list, $item); } } ### then the second argument was a string containing the k +eys in typical form for autovivification... return &deep_defined_action($possible_ref, @new_list); }else{ ### then the arguments following the first are probably al +ready keys, so lets just plug 'em in. return &deep_defined_action($possible_ref, @keys); } }else{ warn "Invalid arguments passed into sub deep_defined. args: @_ + \n"; warn "The first arg should be a ref to a hash or array...\n"; return; } } =pod =head1 --------------------------------------------------------------- +------- =head2 deep_defined_action ( <\%hash or $hash_ref>, @key_list ) OBJECTIVE: Return the value defined for a $mixedHashArray{$key}[$combi +nation]{$set}, without invoking autovivification. (used by the more flexable sub deep_def +ined) PREMIS: Prevent "$mixedHashArray{key}[combination]" from being inst +antiated when effectively testing if "$mixedHashArray{key}[combination]{set}" is defined or +while getting it's value. Specify the key combination set as a list: EX: Test: if ($value = &deep_defined(\%hash, "key", "combination", "set" +)){ print "$value" } ADDITIONAL INFO: This subroutine basically came from +xt and was originally called deep_defined, but the return value has been modified to return the + defined value of a multidimensional hash ref combination rather than just a boolean. To be used in combination with deep_defined; it is meant only to b +e used as an @EXPORT_OK or private function and to be included in the same module in which + sub deep_defined is located. =cut sub deep_defined_action { my( $ref, @keys ) = @_ ; unless ( @keys ) { warn "deep_defined_action: no keys" ; return ; } foreach my $key ( @keys ) { if( ref $ref eq 'HASH' ) { # fail when the key doesn't exist at this level return unless defined( $ref->{$key} ) ; $ref = $ref->{$key} ; next ; } if( ref $ref eq 'ARRAY' ) { # fail when the index is out of range or is not defined return unless 0 <= $key && $key < @{$ref} ; return unless defined( $ref->[$key] ) ; $ref = $ref->[$key] ; next ; } # fail when the current level is not a hash or array ref return ; } #return 1 ; #changed this to return the actual value instead +of just a boolean: Isn't that more useful? return $ref ; } =pod =head1 =============================================================== +============= =head1 SCALAR_TEST FUNCTIONS =head1 --------------------------------------------------------------- +------- =head2 is_int ($scalar) returns true if a scalar is an integer, otherwise undef. =cut sub is_int{ ###### returns true if a scalar is an integer: my ($thing) = @_; if (int($thing) eq $thing){ return 1; }else{ return; } }
Re: Best Multidimensional Hash Practices?
by ssandv (Hermit) on Oct 12, 2009 at 20:44 UTC

    One useful practice for multidimensional hashes is not to use them indiscriminately. There's a tendency to make enormous, monolithic hashes that some people (myself included) sometimes exhibit, something like:

    $user_data{$username}={"date of birth"=>$dob, "ID number"=>$idnum, "mailing address"=>[$street, $citystatezip, ], };

    when it might (and probably would) be better to do something like:

    $user_dob{$username}=$dob; $user_idnum{$username}=$idnum; $user_mailaddress{$username}=[...];

    The obvious symptom of this, when I make this mistake, is a large number of constant strings for subkeys in a hash where the top level keys are of interest. It's just as easy to tie multiple hashes with nice meaningful names together by their keys as it is to keep it all in one hash where the meaning is split between the hash name and second/third level keys, and it's much easier to handle.

      The better solution is to use an object for the user and hide the nasty representation issues from client code that uses the object.

      Using multiple data structures in parallel is always fraught because it is so easy to get them out of sync. Despite the complexity of the one data structure approach, it is generally better to use it and avoid the nastiness of either a slew of globals or ensuring that you always pass the whole suite of variables around correctly.

      True laziness is hard work
Re: Best Multidimensional Hash Practices?
by Herkum (Parson) on Oct 13, 2009 at 20:08 UTC

    I have found, once I started to use Perl objects, there was little need to use deeply nested hash structures. In fact, from all discussions about how to handle them only points out how they can lead to unintended bugs.

    Multi-dimension hashes are at best, poormans objects and should be avoided for large scale applications.

      I have often used deeply nested hashes for extracting and reordering information from logs and such. In these cases, objects tend to be overkill because there is no inherent behavior involved. The class (data container) would just be a bunch of accessors anyway.

      This is actually one of the points where Perl really shines. I've done similar code in C++ and Java and the monkey motion needed to deal with classes (or structs) made the problem harder than it should have been.

      In other cases, of course, using real objects is incredibly useful. But, objects aren't the cure for every problem.

      G. Wade

        You are not really sharing data among applications in that case, you are using hash, for what I consider it most useful, internalized data management with limited scope.

        The problem is people who use hashes to represent complicated relationships in complicated programs. A lack of restrictions using hashes can make them a nightmare when someone can just put anything anywhere. An example of this,

        my $hash{'param'} = 'lh'; # code later $hash{'PARAM'} = $hash{'param'}; # later still print "Param is: " . $hash{'PARAM'} . "\n";

        Here an object would have(hopefully) prevented this code from showing up. Someone did not not know about $hash{'param'} but did find $hash{'PARAM'}. Or the person wrote the print statement, it did not work, so included the the second assignment rather than change their print statement(maybe multiple times).

Re: Best Multidimensional Hash Practices?
by jdrago_999 (Hermit) on Oct 14, 2009 at 02:56 UTC
    if ($myHash{unknown}{if}{this}{key}{combination}{exists}{yet}{or}{not} ){...}

    Unless this is some kind of data structure that you have no control over, You're Doing It Wrong.

    If something is really that deeply nested - and might not exist - I'd say that there is a real architectural problem.

    Something else to take a look at is Data::DPath which might render your code as simple as this:

    use Data::DPath 'dpath'; if( dpath('/unknown/if/this/key/combination/exists/yet/or/not') ) { # Yay }
Re: Best Multidimensional Hash Practices?
by brycen (Monk) on May 25, 2011 at 20:38 UTC
    Here's a complete example of using the Perl vivification module, which you can find on CPAN. Perl's implicit instantiation of hash elements can readily be controlled with fine granularity:
    #!/usr/bin/perl -w use Data::Dumper; my $hash = { 'id' => '992609516', 'lat' => '37.7987145', 'lon' => '-122.4436971', 'tag' => { 'operator' => { 'v' => 'CityCarShare' }, 'amenity' => { 'v' => 'car_sharing' }, }, }; print "Id: $hash->{id}\n"; print "Amenity: $hash->{tag}{amenity}{v}\n"; print "Bmenity: $hash->{tag}{Bmenity}{v}\n"; print "Cmenity: $hash->{tag}{Cmenity}{v}\n" if exists($hash->{tag}{Cmenity}{v}); no autovivification 'exists'; print "Dmenity: $hash->{tag}{Dmenity}{v}\n" if exists($hash->{tag}{Dmenity}{v}); no autovivification; print "Emenity: $hash->{tag}{Emenity}{v}\n"; print Data::Dumper::Dumper(\$hash);

    $VAR1 = \{ 'lat' => '37.7987145', 'tag' => { 'amenity' => {'v' => 'car_sharing' }, 'operator' => {'v' => 'CityCarShare'}, 'Bmenity' => {}, 'Cmenity' => {}, }, 'lon' => '-122.4436971', 'id' => '992609516' };

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://800779]
Approved by almut
Front-paged by pileofrogs
[karlgoethebier]: crawls back to his cell
[shmem]: karlgoethebier: they even program in php to avoid perl. Go figure.
[LanX]: hng
[shmem]: ?
[LanX]: awk! o/

How do I use this? | Other CB clients
Other Users?
Others exploiting the Monastery: (8)
As of 2018-03-20 18:49 GMT
Find Nodes?
    Voting Booth?
    When I think of a mole I think of:

    Results (258 votes). Check out past polls.