Beefy Boxes and Bandwidth Generously Provided by pair Networks Frank
There's more than one way to do things
 
PerlMonks  

The Bad, the Ugly, and the Good of autovivification

by tlm (Prior)
on Apr 08, 2005 at 02:06 UTC ( #445927=perlmeditation: print w/ replies, xml ) Need Help??

Recently, while writing a couple of nodes on references (they were intended to be "instructive" though I'm afraid they turned out to be just tiring), I struggled with the question of whether to bring up the subject of autovivification, and if so how thoroughly. Since the nodes were already bloated, I punted, regretfully.

And, as it happens, it seems that this is the tack that most expositions of Perl references take. When autovivification is mentioned at all, it is to rave about its (undeniable) virtues, i.e. the Good. It's as if we are so eager to encourage the tremulous newbie to try riding the bicycle without the training wheels, that we don't want to dampen any enthusiasm with talk of potholes, and semis. Indeed, only rarely is the dark side of autovivification mentioned, let alone discussed at any length (hence the non-standard ordering of adjectives in this node's title, although I also like the appropriateness of the acronym that results from this reordering). This means that the new programmer, happily rolling along, secure in Perl's dwimitude, usually learns about the Bad and the Ugly sides of autovivification by crashing against a nasty bug.

That's what happened to me...

I was a much younger programmer then...(insert your favorite flashback effects here). The software I was writing was for doing some statistical analysis on large collections of entities and their attributes (also numerous). I was using "association tables", that were implemented as HoHs whose primary keys were entity ids and secondary keys were attribute ids (or viceversa). (I was using undef as the value of all these ordered pairs, which probably didn't help.)

Here's the Bad. Consider the following snippet:

use strict; # ... my $exists = exists $hoh{ typo }{ attrib_1 }; # strict can't hear you screa +m... # ... life goes on my $number_of_entities = keys %hoh; # BONK!
The count in $number_of_entities is off by 1 (at least), because now it includes the bogus entity 'typo'.

Or consider this one:

my @big_in_assoc_1 = grep keys %{$assocs_1{$_}} > 25, keys %assocs_2; # ... tics later my @relative_complement = grep !exists %assocs_1{$_}, keys %assocs_2; + # OUCH!
The first line above collects all the entities from table %assocs_2 that have more than 25 attributes in table %assocs_1, but in the process potentially autovivifies any number of empty hashes in %assocs_1 (namely those corresponding to entities in %assocs_2 that were not originally in %assocs_1). So @relative_complement above is always empty.

Of course, to the hardened Perl programmer, the lines above are plainly foolish, just asking for it. But to the greenhorn they look pretty reasonable, cool even.

Those were days of interminable debugging, of endless wading through the muck with DB's s. We went nuts, and some of us never recovered. My buddy... my buddy... Last time I heard of him he was programming Python somewhere out West.

Those of us who pulled through have had to learn to live with the Ugly. Gone are the carefree days when keys was my trusted friend, the only tool I needed to find the size of a table:

my $number_of_entities = grep defined $hoh{$_}, keys %hoh;
Now I know the insidious treachery keys is capable of:
my @big_in_assoc_1 = grep $assocs_1{$_} && keys %{$assocs_1{$_}} > 25, keys %assocs_2;
There's more, but these painful memories are mostly blocked.

the lowliest monk

Comment on The Bad, the Ugly, and the Good of autovivification
Select or Download Code
Re: The Bad, the Ugly, and the Good of autovivification
by Roy Johnson (Monsignor) on Apr 08, 2005 at 02:31 UTC
    Yes, autovivification is capricious. It would be a great improvement if it only happened in lvalue contexts (ordinary value contexts could simply short-circuit and return null). If I may add an adjective to your canonical list, I'd like to present The Simple: if Perl has to go through (dereference) a reference to get to something, autovivification happens (obviously, it also happens if the reference is itself used in lvalue context).

    The subject does, from time to time, come up here, as in Looping through a hash reference is creating a key...?.


    Caution: Contents may have been coded under pressure.

      If I may add an adjective to your canonical list, I'd like to present The Simple: if Perl has to go through (dereference) a reference to get to something, autovivification happens...

      Yes, this is simple for the experienced programmer who is comfortable with the whole notion of references, but not so simple for the programmer who is just beginning to work with them. I know from helping newbies at work that references don't come easy to many people for some reason. And even in the best of cases, one first has to become sensitized to the possibility of autovivification-mediated trouble before one develops an eye for unintended autovivification. This is true for just about any class of bugs. (perlreftut, perlref, and perltrap need to do more towards alerting programmers to autovivification bugs.)

      And even with a bit of experience, something like this:

      my @good_ones = grep $_->stars == 5, @dvds{ qw( Ray Alexnader Sideways Catwoman Avia +tor ) };
      can silently trip you. Or, while it's clear that
      if ( exists $h{ wild_guess }->{ ssn } ) { ... }
      is dereferencing, and hence autovivifying, $h{ wild_guess }, it is less clear (or at least it was so to me) that
      keys %{ $h{ wild_guess } }
      is also autovivifying $h{ wild_guess } even though apparently there is no dereferencing going on (if by "dereferencing" one means getting the value at a given address). That's why mMy version of your Simple is "any time that perl is asked to interpret an undef as if it were a hash ref (or array ref, or scalar ref), it will turn the undef into a ref to an empty hash (or empty array, or undef)."

      Update: What am I saying?! Of course there is dereferencing in %{ $h{ wild_guess } }. So "asked to interpret an undef as if it were..." is just a wordier way to say "dereference". I confess that I find it more intuitive somehow, but I'd still say that your Simple is much simpler than mine.

      Update: Added perltrap to the docs list above.

      the lowliest monk

        perlreftut and perlref need to do more towards alerting programmers to autovivification bugs.
        Really, like what?
Re: The Bad, the Ugly, and the Good of autovivification
by tall_man (Parson) on Apr 08, 2005 at 05:25 UTC
    I'm finding misspelled keys enough of a problem in my applications that I'm considering drastic solutions like Tie::Hash::FixedKeys for critical hashes where I know the keys in advance.
    use strict; use Tie::Hash::FixedKeys; my %a : FixedKeys( qw(a b) ); %a = (a => 1, b=> 2); print "Doesn't exist\n" unless exists($a{c}); print "Also Doesn't exist\n" unless exists($a{c}->{d}); print "Oh, my, not good\n" if exists($a{c});

    The attempted autovivification on line 7 will cause a croak.

Re: The Bad, the Ugly, and the Good of autovivification
by Anonymous Monk on Apr 08, 2005 at 13:00 UTC
    It's a bit unfair to blame autovivification for your typo. Even without the typo, the value of $exists is possibly wrong. And a good test suite would have tested for this - it would have run a text which would set $exists to a true value. And then fail because that didn't happen.

    But the problem is deeper. The problem is using string literals as hash keys, because that more or less turns hash elements into variables - with all the drawbacks of package variables, and then some. If you want a variable, use a (preferably) lexical variable. Don't fall for the sweet lures of "oh, just stuff it into a hash" - it's worse than turning off strict. If you do need to populate a hash, put the index into a constant (with Readonly) or a variable.

    Remember the old rule "never put magic numbers inside your program - use constants". But that applies to strings as well. Don't use string literals - use constants. And don't blame autovivification if you do use string literals and burn yourself.

      The "typo" was just a convenient way of illustrating a "hash key that got messed up somehow". This can happen in less trivial ways. E.g. a bug in a regex can send you down that path. Still, to this one could reply, well, fine, bugs happen, what's new? The reason for my bringing all this up is to point out that in the presence of autovivification such mangled keys lead to two errors. One is the familiar one: the fetching of nonexistent values leaves one with variables at the receiving end that are erroneously set to undef. It's the second one is the one that blindsides newcomers: the creation of new hash keys as a side effect. Programmers, I think, are more sensitized to the former than to the latter. As I said elsewhere, the most insidious bugs are those whose possibility one is not even aware of. Moreover, in my experience, these autovivification bugs would often manifest themselves far away (in the program's logic) from where the error happened.

      the lowliest monk

        *shrug*

        I don't get your point. open could cause havoc as well if you've botched up the second argument (or third if you use three arg open). Is that the fault of open? If you match a regex against a string you've botched up, you get the wrong result. Does that mean matching has an "ugly" side? If you pass in the wrong arguments, most functions will do the wrong thing - but that's not the fault of said function. It's the fault of the caller of said functions.

        Let me quote an early computer pioneer:

        On two occasions I have been asked, 'Pray, Mr. Babbage, if you put into the machine wrong figures, will the right answers come out?' I am not able rightly to apprehend the kind of confusion of ideas that could provoke such a question.

        Charles Babbage

Re: The Bad, the Ugly, and the Good of autovivification
by hardburn (Abbot) on Apr 08, 2005 at 13:17 UTC

    When autovivification is mentioned at all, it is to rave about its (undeniable) virtues, i.e. the Good. It's as if we are so eager to encourage the tremulous newbie to try riding the bicycle without the training wheels, that we don't want to dampen any enthusiasm with talk of potholes, and semis.

    My experiance is just the opposite. The first time I see most new Perl programmers run into it is when it causes some bug, which causes some (understandable) complaints about the feature.

    I do wish exists was special cased to not autoviv a deep element when a shallow element doesn't exist.

    "There is no shame in being self-taught, only in not trying to learn in the first place." -- Atrus, Myst: The Book of D'ni.

Re: The Bad, the Ugly, and the Good of autovivification
by 5mi11er (Deacon) on Apr 08, 2005 at 15:43 UTC
    There is something, added to 5.8, that could help catch the "typo"s from happening. I've been a lurker on the perl5-porters mailing list, and one of the more fascinating discussions I read about, back when I was actually able to take time to read that list, was an idea about creating "clamped" hashes. The idea was to help eliminate accidentally creating hash keys, but it grew from there.

    The current implementation has been made a core module called Hash::Util, you can read about it here.

    In an attempt to quickly summarize what you'll find at the link above, the following methods are available:

    Method Description
    lock_keys(%hash)don't allow any keys other than what currently exists
    lock_keys(%hash,@keys)dont allow any keys other than those in the array @keys
    unlock_keys(%hash)unlock the hash to be able to add keys
    lock_value(%hash,$key)Keep the value at $hash{$key} from being changed
    unlock_value(%hash,$key)Allow the value to change
    lock_hash(%hash)don't allow keys or values to change
    unlock_hash(%hash)allow keys and values to change

    -Scott

      Yeah, unfortunally, for algorithms that use 'exists', clamping hashes doesn't make sense. Either the key is known to be there (so, no need for exist), or its existance is volotile, which means that clamping hash prevent the keys from being inserted.

      Sure, you can unclamp the hash whenever you insert a new key, but then it might be easier to write your exist test as:

      $exists = exists $hash{key1} && exists $hash{key1}{key2};
        I don't think the first part of your argument is strictly true. If you use the @key's option, it would be possible for keys to not exist, or to be added dynamically, thus you might still need to see if that key happened to exist or not.

        However, you're correct that none of this stuff helps at all with respect to "auto-vivication". And your code example does appear to help work around the problem when dealing with nested structures.

        Hmm, Merlyn recently posted a snippet that walked a structure to it's leaf nodes, maybe we could combine the ideas of that snippet with this code to create a new form of exists?

        -Scott

        PS. the last paragraph was written "tongue-in-cheek", but it's late, and I would not be at all surprised to come into work on Monday to find someone actually wrote such a beast...

Re: The Bad, the Ugly, and the Good of autovivification
by ihb (Deacon) on Apr 10, 2005 at 18:11 UTC

    Yeah, I think a lot of people get bitten by that and I certainly did when I was new. This isn't a recent issue and Brent gave us Hash::NoVivify five year ago. Still, I see no one using it. How about making Hash::NoVivify or some equivalent module a standard module so people don't have to reinvent this tiny wheel all the time?

    ihb

    See perltoc if you don't know which perldoc to read!

      How about making Hash::NoVivify or some equivalent module a standard module so people don't have to reinvent this tiny wheel all the time?

      Presumably, you mean with "standard module", a module that is distributed with the main Perl distribution. However, we already have a (succesful) mechanism in place to prevent reinventing wheels. And it even works without having to upgrade your perl. It's called CPAN.

      People don't have to reinvent the wheels to get a graphical environment, to fetch documents with HTTP, or to connect to a database, yet there are no "standard modules" to do any of this.

        Overlooking the smug tone; there's a limit of where people don't want to rely on non-standard modules. Small things like this get reimplemented over and over again, because it's "so small" and it's "not necessary to use a module for that". When a module become standard, that attitude changes somewhat.

        Now, I'm not saying that this particular module should be in the standard library, but I definately think your categorical rejection of it lacks. Slightly overlooking that the choice to include a module in the standard library seems somewhat arbitrary; many of the newer standard modules are "Perl close", i.e. they solve a problem that has to do with Perl the language, and many others solve omni-present problems. Creating a GUI and your other examples are not omni-present problems. When does a module qualify as a standard module for you?

        ihb

        See perltoc if you don't know which perldoc to read!

Re: The Bad, the Ugly, and the Good of autovivification
by Anonymous Monk on Oct 12, 2012 at 07:31 UTC

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlmeditation [id://445927]
Approved by bobf
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others examining the Monastery: (8)
As of 2014-04-18 06:59 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    April first is:







    Results (462 votes), past polls