Beefy Boxes and Bandwidth Generously Provided by pair Networks
P is for Practical
 
PerlMonks  

Best Hash Practices?

by DamianKaelGreen (Acolyte)
on Oct 09, 2009 at 00:53 UTC ( #800131=perlquestion: print w/ replies, xml ) Need Help??
DamianKaelGreen has asked for the wisdom of the Perl Monks concerning the following question:

Q1:

How do you avoid having to test the existence of a hash{key} before testing the existence of its corresponding value?

PREMIS:

The Perlish way to test if a scalar has been set is to do something like:

if ($foo){...}

so ideally the perlish way of testing a value in a hash might be to do something like this:

if ($myHash{"unknown_if_this_key_exists_yet_or_not"}){...}

But the reality is that will not work because an entry for the key automatically gets created in the hash if we try to do that, so we end up having to do something like this instead:

if (exists($myHash{"unknown_if_this_key_exists_yet_or_not"}) and $myHash{"unknown_if_this_key_exists_yet_or_not"}){...}

But that's a lot of typing, and typing isn't very perlish. So how do we avoid having to do this every time?

I thought about using Hash::Util qw{lock_hash unlock_hash) to lock a hash whenever it is not being modified, but programs carp out in the following situation:

if ($myHash{"this_key_does_not_exist"){...}

Again, not very perlish. It would be nice if accessing the hash just returned the value...

So then I considered "use Readonly::Hash"; and then you can do:

Readonly::Hash $myHash => ("key1" = "val1",); $myTempValue = $myHash{"this_key_does_not_exist"};

and the program will not die. But that only works as long as you never have to modify the hash. When that time comes you're in trouble. There does not seem to be any way to make the hash modifiable again once it is made Readonly... Maybe perl monks know of a way?

Finally, I've concluded that the best way to handle this is to just create a separate subroutine that accepts a key as input, checks the existence of both the key and value pair, and then returns the value. But my question to the monks is this: what really is the best way to handle this situation? I'm sure people encounter this problem all the time, and I can't seem to find any documentation for it anywhere. What should the best practice be?

Q2:

When "use strict;" is in effect, how can you avoid having to test the existence of a hash-key before using it to reference an array?

In other words:

How do we avoid having to use the first of the following two lines:

if ($myHash{"unknown_if_this_key_exists_yet_or_not"}){ print "values in list are: @{$myHash{\"unknown_if_this_key_exists_ +yet_or_not\"}}\n"; }

PREMIS:

Really I think it would be much nicer if instead of bombing out, Perl would return an empty list when a list is to be referenced by an undefined value(when using strict). "use warnings;" should be the method of identifying empty references, not "use strict;", and hopefully that will change with perl6, but until then: what should Perl programmers all around the world do?

For one thing, the code above creates a new hash entry if the key didn't already exist, so really the shortest way I can think of to handle it would be something like:

if (exists($myHash{"unknown_if_this_key_exists_yet_or_not"}) and $myHa +sh{"unknown_if_this_key_exists_yet_or_not"}){ print "values in list are: @{$myHash{\"unknown_if_this_key_exists_ +yet_or_not\"}}\n"; }

But I'm sure you agree that's pretty ugly...

So I guess the second question has more to do with Lists than hashes, but a standard for handling these situations should be documented at the very least, hence the reason for starting this blog.

All ideas are welcome...

----------------------------------------------------------------------

Adding the following update three years later (Jan. 2013):

After following this thread and a few others related to autovivification, I have come to believe that the best practice for handling these situations is to avoid them in the first place. Invoking the module "no autovivification" at the top of all your perl code disables autovivification and makes single and multidimensional hashes behave in a more perlish manner. I have been doing this with all my code for the past three years now and have not noticed any negative side affects. I recommend this approach to everyone...

Additionally, for Q2 above, the recommendation to follow the $hash{$ref} with a " || []" when being tested does indeed help when use strict is in place in case there is no value. Ex:

use strict; no autovivification; #<-- include in all programs, all the time! my %myHash =(); print "values in list are: @{$myHash{\"unknown_if_this_key_exists_yet_ +or_not\"} || []}\n";

That particular list was empty...

Comment on Best Hash Practices?
Select or Download Code
Re: Best Hash Practices?
by alexlc (Beadle) on Oct 09, 2009 at 01:30 UTC

    I'm not certain that it is a best practice, but I find that Data::Diver and eval() help out a lot when dealing with hashes and arrays.

    #!/usr/bin/env perl use Data::Dumper qw| Dumper |; use Data::Diver qw| Dive DiveVal DiveRef DiveDie DiveError DiveClear | +; my $h = { key => 'value', hoh => { key => 'value' }, hoa => [ 'a', 'b', 'c' ], deep => { values => { are => 'supported' } }, }; Dive($h, 'key') && warn "'key' is there"; !Dive($h, 'nokey') && warn "'nokey' is not there"; warn "notice 'nokey' was not created"; warn Dumper $h; warn "Values are: " . join(', ', eval {@{Dive($h, 'hoa')}}); warn "No errors for non arrays: " . join(', ', eval {@{Dive($h, 'empty +')}}); warn "Deep values: ". Dive($h, qw| deep values are |); !Dive($h, qw| deep values that is not there |) && warn "Missing Deep v +alues are safe"; __END__ 'key' is there at ./test.pl line 12. 'nokey' is not there at ./test.pl line 13. notice 'nokey' was not created at ./test.pl line 14. $VAR1 = { 'deep' => { 'values' => { 'are' => 'supported' } }, 'hoa' => [ 'a', 'b', 'c' ], 'hoh' => { 'key' => 'value' }, 'key' => 'value' }; Values are: a, b, c at ./test.pl line 17. No errors for non arrays: at ./test.pl line 18. Deep values: supported at ./test.pl line 19. Missing Deep values are safe at ./test.pl line 20.

    Some of those are still not beautiful ( especially array dereference ), but I find it a clean way to deal with hashes, as it already implements the autovivification and error handling logic.

    -- AlexLC
Re: Best Hash Practices?
by muba (Priest) on Oct 09, 2009 at 01:40 UTC

    First things first. ++ for writing a clear and detailed description of your question, including an explanation of why you need it and what your troubles are, without omitting to mention the things you have tried.

    That being said, I must add that you should've known your question is based upon wrong assumptions. I won't hold that to you, though, since I'm having a rather hard time too to find the docs that say that autovivication doesn't happen when you simply check wether a hash key exists or is true.

    Anyway, I do remember I read it somewhere and I wrote a small script to prove it. Here goes:

    use strict; # or die use warnings; # or die my %hash = ( cogito => "ergo sum" # I think, so I exist ); if (exists $hash{absum}) { print "Absum exists."; } else { print "Absum doesn't exist.\n"; # which would explain it's name... } print "Keys in the hash: ", join(" :: ", keys %hash), "\n"; __END__ Absum doesn't exist. Keys in the hash: cogito

    So there you have it. As for your second question, simply use a boolean OR.

    use strict; use warnings; my %hash = ( list1 => ["foo", "bar", "baz"] ); print "list1: [- ", join(" :: ", @{ $hash{list1} || []}), " -]\n"; print "list2: [- ", join(" :: ", @{ $hash{list2} || []}), " -]\n"; __END__ list1: [- foo :: bar :: baz -] list2: [- -]

    Admittedly, that  || [] part in there isn't quite a beauty either but it does what you need without too much typing.

      > that autovivication doesn't happen when you simply check wether a hash key exists or is true.

      unfortunately not that simple!

      use strict; # or die use warnings; # or die my %hash = ( cogito => "ergo sum" # I think, so I exist ); if (exists $hash{absum}{absum}) { print "Absum exists."; } else { print "Absum doesn't exist.\n"; # which would explain it's name... } print "Keys in the hash: ", join(" :: ", keys %hash), "\n";
      output
      /usr/bin/perl -w /home/lanx/tst.pl Absum doesn't exist. Keys in the hash: cogito :: absum

      Cheers Rolf

        unfortunately not that simple!

        if (exists $hash{absum}{absum})

        Ah, yes. But note how that syntax implies a -> operator, as ikegami described elsewhere in this thread, and it is that operator which makes $hash{absum} come into existance. Of course it is funny to see how exists still thinks $hash{absum} doesn't exist, however, not so funny anymore when you realize that exists really tries to check the existance of $hash{absum}->{absum} here. So exists is right and the information in that print statement is factually wrong. It should state print "Absum->absum doesn't exist\n" or something similar meaning the same.

Re: Best Hash Practices?
by jettero (Monsignor) on Oct 09, 2009 at 02:07 UTC
    so ideally the perlish way of testing a value in a hash might be to do something like this:
    if ($myHash{"unknown_if_this_key_exists_yet_or_not"}){...}
    But the reality is that will not work because an entry for the key automatically gets created in the hash if we try to do that, so we end up having to do something like this instead

    No it doesn't. Try it.

    my %h; if( $h{"isn't there"} ) { 1 } print "$_\n" for keys %h; # does not print anything
    The real question is, do you need to test true when $h{key} is set and false? Then you use exists rather than just a boolean test.

    UPDATE re-reply: indeed. Well said.

    -Paul

      The real question is, do you need to test true when $h{key} is set and false? Then you use exists rather than just a boolean test.

      Right.  Then, there's also defined, which is kind of "in between" testing for truth and existence.

      I'm sure you know, but maybe for others the following little truth table helps to summarize the relationships of what can be tested with a hash:

      sub truth_table { my $hash = shift; print " true? defined? exists?\n"; for my $key (qw(foo bar baz bla)) { print " $key "; printf " %-8d", $hash->{$key}; printf " %-8d", defined $hash->{$key}; printf " %-8d", exists $hash->{$key}; print "\n"; } } truth_table( { foo => 1, # true bar => 0, # false baz => undef, # undefined # bla # doesn't exist } ); __END__ true? defined? exists? foo 1 1 1 bar 0 1 1 baz 0 0 1 bla 0 0 0
Re: Best Hash Practices?
by Marshall (Prior) on Oct 09, 2009 at 02:49 UTC
    1. I don't know of any way to cause a hash key to be created by use of an "if" test.

    2. I don't know of any way to cause a hash key's value to be "non-existant". There is a thing called "undefined", undef. But undef is not the same as "non-existant". undef means exists but I don't know what the value is.

    3. When you test a hash key, you are testing the value of the key. It can be true or false. false values are: "undef","",'',0.

    4. If a hash key value "exists" then it can have any one of the 4 values above. Update: well of course, then it can also have some other string or numeric value. The above 4 things, which are actually only 3 things, undef, null string and zero are all the same "false" value in a logical expression.

    5. If a hash key value is "defined", then there only 3 possibilities. Update: well "" and '' are the same once the string is interpreted.

    #!/usr/bin/perl -w use strict; use Data::Dumper; my %hash = ('a' => 2, 'b' => 3, 'c' => undef); print Dumper (\%hash); if (exists ($hash{'c'}) ) {print "key c exists\n"} else {print "key c does not exist\n"}; print Dumper (\%hash); if ( defined ($hash{'c'}) ) {print "key c defined\n"} else {print "key c not defined\n"}; if (exists ($hash{'d'}) ) {print "key d exists\n"} else {print "key d does not exist\n"}; #note that undef,"",'' and 0 all evaluate to "false" #play with c value above and run this code #you can call defined($xxx) to figure out the difference #between a false value from "",'',0 and undef. if (my $x = $hash{'c'}) {print "c has value $x\n"} else {print "c has value,\"\",0 or undef\n"}; if (my $x = $hash{'b'}) {print "b has value $x\n"} else {print "b has no value\n"}; __END__ $VAR1 = { 'c' => undef, 'a' => 2, 'b' => 3 }; key c exists $VAR1 = { 'c' => undef, 'a' => 2, 'b' => 3 }; key c not defined key d does not exist c has no value b has value 3
      1. I don't know of any way to cause a hash key to be created by use of an "if" test.
      if ($hash{foo} = 42) { print "The answer to life, the universe, and everything is contained + within foo!\n"; }
      So the answer to life, the universe, and everything is simply a common error.
        This if ($hash{foo} = 42) is an assignment statement and the if tests the result of that.
         if ($hash{foo} == 42) would probably do something else.

        for all my goofs above, I don't see how a properly formatted, syntactically correct "if" statement, a question in essence, can cause a new element to be entered into a data structure. I can see how this above statement could do that, but that is because it is more than a simple logical "if".

      1. I don't know of any way to cause a hash key to be created by use of an "if" test.

      Correct. There isn't. Another operator has to come into play. However, the following fools many since the operator is invisible:

      if ($hash{foo}{bar})

      See my reply to the OP.

      2. I don't know of any way to cause a hash key's value to be "non-existant".

      delete $hash{foo}; delete @hash{qw( foo bar )}; delete local $hash{foo}; # Since 5.11.0 delete local @hash{qw( foo bar )}; # Since 5.11.0

      When you test a hash key, you are testing the value of the key. It can be true or false. false values are: "undef","",'',0

      The string "undef" isn't false. Plain old undef is, though.

      The second and third literal you posted represent the same value.

      And you're missing some, most notably "0". Except for some insane situations, anything that stringifies to "" or "0" is false. The common false values are undef, the empty string, 0 and "0".

      4. If a hash key value "exists" then it can have any one of the 4 values above.

      Not true. Aside from the fact that you only listed three values, a hash value can be an scalar, not just false ones.

      5. If a hash key value is "defined", then there only 3 possibilities.

      Not true. It can be any scalar value except undef.

      Update: Added lots as I found that every claim after the first had serious errors.

        1. I don't know of any way to cause a hash key to be created by use of an "if" test.

        Correct. There isn't. An other operator has to come into play. However, the following fools many since the operator is invisible:

        if ($hash{foo}{bar}) download

        See my reply to the OP. Interesting...I will have to experiment with this. 2nd hash dimension wasn't part of the question.

        2. I don't know of any way to cause a hash key's value to be "non-existant".

        delete $hash{foo}; delete @hash{qw( foo bar )}; delete local $hash{foo}; # Since 5.11.0 delete local @hash{qw( foo bar )}; # Since 5.11.0
        I'm only on 5.10, so learned something new. Update: still don't see it, ie. how to leave the key but have the value of that key be anything other than undef,string(null or not) or number. To the best of my knowledge a hash key will always evaluate to at least undef. delete $hash{foo} removes key foo and its value.
        When you test a hash key, you are testing the value of the key. It can be true or false. false values are: "undef","",'',0

        The string "undef" isn't false. Plain old undef is, though.
        Yes this was a typo, quotes were wrong to use.

        The second and third literal you posted are the same value.

        And you're missing some. Except for some insane situations, anything that stringifies to "" or "0" is false. The common false values are undef, the empty string, 0 and "0".
        no disagreement here. 0 and "0" I believe will wind up being in practice the same thing.

        4. If a hash key value "exists" then it can have any one of the 4 values above. Not true. Aside from the fact that you only listed three values, a hash value can be an scalar, not just false ones. 5. If a hash key value is "defined", then there only 3 possibilities. Not true. It can be any scalar value except undef.
        I meant the false values, you are correct.
        Update: Added lots as I found that every claim had serious errors.
        perhaps not one of my better posts..posted code works as claimed, but explanation could have been better.

        Thanks for your clarifications.

Re: Best Hash Practices?
by ikegami (Pope) on Oct 09, 2009 at 03:23 UTC

    [ Some of this has been said already, but it's mostly to lead to the stuff that hasn't. ]

    The Perlish way to test if a scalar has been set is to do something like: if ($foo){...}

    You can't test whether a scalar has been set. It's a good thing it's rarely useful to know that.

    "if ($foo)" is even a poor check for checking if $foo contains a number or a string. One of each will be interpreted incorrectly. "if (defined($foo))" is much more useful.

    so ideally the perlish way of testing a value in a hash might be to do something like this:

    Well, it was if (defined($foo)) for scalars, so is it if (defined($hash{foo})) for hashes? Indeed it is. Very rarely do need to know whether the key exists or not. defined is quite often sufficient.

    In fact, a simple truth test is usually sufficient because hashes and arrays often contains objects or references to other hashes and arrays.

    But the reality is that [something like if ($hash{foo})] will not work because an entry for the key automatically gets created in the hash if we try to do that

    That's not true. You need to use the hash value as an lvalue for it to get created, and even that's not enough in some cases.

    my %hash; 1 if $hash{t1}; 1 for $hash{t2}; \$hash{t3}; sub { }->( $hash{t4} ); sub { ++$_[0] }->( $hash{t5} ); print "$_\n" for sort keys %hash;
    t2 t3 t5

    Sub args are special lvalues.


    So why do you think "if ($hash{foo})" creates $hash{foo}?

    Maybe you're thinking of multi-level structures.
    if ($hash{foo}{bar}),
    if (defined($hash{foo}{bar})) and
    if (exists($hash{foo}{bar}))
    all populate $hash{foo} with a hash ref if if it didn't exist or if it wasn't defined. This is called autovivification, and it's a feature of dereferencing.

    Remember that
    $hash{foo}{bar}
    is short for
    $hash{foo}->{bar}
    and that -> is the dereferencing operator. It needs a reference to act upon. Since its LHS is undefined, it creates the necessary reference rather than crapping out. It can be annoying to debug, but it's a very convenient shortcut at times.

    If you want to grab the element of a multi-level structure without autovivifying the lower levels if they don't exist, you need to check each level.

    if ($hash{foo}{bar})
    would be changed to
    if ($hash{foo} && $hash{foo}{bar})

    Notice I didn't use defined or exists for $hash{foo}. If $hash{foo} can contain a reference, it's usually the case that it can't contain anything but undef or a reference, so it's sufficient to test for truthfulness. This goes back to what I said earlier on (4th paragraph).

      Maybe you're thinking of multi-level structures.
      Here is an example for the OPer to play with to gain definition concerning the existence of the truth of all this:
      >perl -wMstrict -le "my %hash = qw(a 1 b 2); print 'true 1st level' if $hash{c}; print exists $hash{c} ? '' : 'NOT ', 'exists 1 c'; print 'true 2nd level' if $hash{c}{d}; print exists $hash{c} ? '' : 'NOT ', 'exists 2 c'; print exists $hash{c}{d} ? '' : 'NOT ', 'exists 2 d'; " NOT exists 1 c exists 2 c NOT exists 2 d
      And substitute something like
          ... if $hash{c} == 42;
      for
          print 'true 1st level' if $hash{c};
      to see the effects of an actual comparison versus a simple truth test.
      if ($hash{foo}{bar}) bit me really badly a while back. Yet another good reason to use the -> instead of leaving it out.
        I wish autovivi was controllable by pragma. Maybe one day I'll be inspired to write it.
Re: Best Hash Practices?
by JavaFan (Canon) on Oct 09, 2009 at 09:10 UTC
    How do you avoid having to test the existence of a hash{key} before testing the existence of its corresponding value?
    Uhm, Wednesday?

    Your question doesn't make sense, and hence the nonsense answer. In Perl, there do not exist hashes that can have keys without a corresponding value. It's impossible. If a key exists, there must be a value. The value may not be defined, but it's there.

Re: Best Hash Practices?
by LanX (Canon) on Oct 10, 2009 at 04:05 UTC
    > All ideas are welcome...

    ok ... what about this?

    use strict; use warnings; sub xdefined { my $ref=shift; my $type; while ( my $key=shift) { $type= ref($ref); if ( $type eq "HASH" ) { return unless defined $ref->{$key}; $ref=$ref->{$key} } elsif ( $type eq "ARRAY" ) { return unless defined $ref->[$key]; $ref=$ref->[$key] } else # no reference { return }; } return defined $ref; } my $h->{a}->[2]->{c}=0; $\="\n"; $,=":"; print 5, xdefined($h,"a",2,"c",4,"e"); print 4, xdefined($h,"a",2,"c",4); print 3, xdefined($h,"a",2,"c"); print 2, xdefined($h,"a",2); print 1, xdefined($h,"a"); print 0, xdefined($h);
    -->
    5 4 3:1 2:1 1:1 0:1

    Using prototypes like (\[%@]@) can further "beautify" the interface! 8)

    Cheers Rolf

      Yes, this is the direction I was headed with this too... It would be nicer though if the input to the function was just the autovivified list of keys though, so we could maintain a more usual appearance, like:

      print 6, isDefined($h{"a"}{2}{c}{4});

      This is a kind of standard function that I think should be developed more thoroughly for everyone to use regularly; Either that, or the fundamentals of the way a multidimensional hash is handled should possibly be re-evaluated to recognize when something is being created vs when it's being tested...

        Hi

        I know what you mean but the syntax you're proposing is not feasible, since it's the arrow operator which is autovivifying!

        isDefined($h{"a"}->{2}->{c}->{4});

        So when isDefined comes into action it's already to late!

        Anyway one can easily extend the syntax of my approach to make it "smoother"!

        xdefined(%h => qw/a 2 c 4/)

        or

        xdefined(%h => "a",2,"c",4)

        But IMHO (within the boundaries of perl) the syntactically "prettiest" way to go would be to use a tied copy of your hash, like this you may be able to write

        novivi(%h)->{a}{2}{c}{4}

        %h2=novivi(%h) should be an empty hash with a hash-tie which mirrors safely the structure of %h.

        Such that $h2{a} is only be defined iff $h{a} is defined and all the automatically vivified structures in $h2{a}{2}{c}{4} are automatically tied hashes bound to the corresponding structures in %h.

        Well ... I think this is feasible, BUT should come with a significant performance penalty compared to my first approach, since ties are expensive.

        Cheers Rolf

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://800131]
Approved by Bloodnok
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others perusing the Monastery: (3)
As of 2014-09-21 06:15 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    How do you remember the number of days in each month?











    Results (166 votes), past polls