Tied Hashes vs. Objects

Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Greetings monks, I'm faced with a problem for which I surely could use some guidance:

I have to process a "tree" of user-supplied data, which is provided as a hash reference. Hash values can be scalars, arrays and hash references, which I happily recurse into, like this:

sub process {
  my $node = shift;
  if (my $reftype = ref($node)) {
    return process_array($node) if $reftype eq 'ARRAY';
    return process_hash($node) if $reftype eq 'HASH';
    die "invalid node type: $reftype";
  } else {
    return process_scalar($node);
  }
}

process({ 'a' => '1', 'b' => [ '2' ],  'c' => { 'd' => '3' } });
[download]

The example is contrived, but I think you get the picture (process_array and process_hash call process on their respective values, eventually). Now my users want to provide also "special" kinds of hashes, e.g. hashes with case-insensitive keys. Of course, tied hashes come to mind, which would be fine with me, since my function can stay happily unaware of whether its passed a reference to a plain or a tied hash. However, there is some concern about the performance implication of tied hashes, and an OO interface has been suggested:

package Local::MyComp::Map;

sub new { ... }
sub get { ... }
sub set { ... }
sub keys { ... }

etc. etc.
[download]

Of course, my function would have to handle such Map instances specially, e.g.

  if (my $reftype = ref($node)) {
    if (blessed($node) && $node->isa(Local::MyComp::Map) {
      process_map($node);
    } 
  ...
[download]

with process_map using the Map's keys and get methods to iterate over its elements. So I'm asking the Perl monks: Do the performance implications of tied hashes really warrant the introduction of a class that in fact just behaves like a tied hash (but with an OO interface)? I.e., would using a special class for this purpose be really more efficient than using a tied hash?

Comment on Tied Hashes vs. Objects Select or Download Code

Replies are listed 'Best First'.
Re: Tied Hashes vs. Objects by BrowserUk (Patriarch) on Oct 22, 2012 at 19:04 UTC
From what I recall, tied hashes are measurably, but not horribly so, slower than an equivalently functioning object. I believe there are one or two extra indirections involved in method resolution. But for that penalty, you get -- as you observed -- transparency for any and all code that need to use the 'special hash'. That is a substantial advantage in terms of developer efficiency and code-base simplicity. Two huge advantages well worth the cost of a few microseconds. So, unless it is observable that your code is becoming a bottleneck in the overall process, don't even consider it. If it is observable that your code is becoming a bottleneck, then profile the code -- the entire code, end to end -- and determine if those one or two extra dereferences are actually a substantial contributory factor. In most cases they will not be. You need to be processing substantial amounts of in-memory data, very intensively, before such small differences become significant. I have switched from a tied interface to using the method interface enabled by the object handle returned from tie, for performance reasons when performing cpu-intensive processing on huge, in-memory datasets, once or twice in the past. But as I recall, on both occasions I later found other changes that saved much more that the switch did and I reverted. The transparency of tied datasets is just too effective to discard without good reason. With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday' Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error. "Science is about questioning the status quo. Questioning authority". In the absence of evidence, opinion is indistinguishable from prejudice. RIP Neil Armstrong	[reply]
Re^2: Tied Hashes vs. Objects by Anonymous Monk on Oct 22, 2012 at 19:14 UTC
Thank you very much, this was exactly the kind of advice I was seeking. In some circles, I think, tied hashes have for some reasons gained a bad reputation, performance-wise. I could not find any references backing this, so unless proven otherwise, I'll stick with tied hashes, as you sugested.	[reply]
Re^3: Tied Hashes vs. Objects by BrowserUk (Patriarch) on Oct 22, 2012 at 21:31 UTC
As it had been a long time since I'd performed this kind of benchmark, I did one: <Reveal this spoiler or all in this thread> The results: `__END__ C:\test>b-tieHash.pl 8376437/16777216 std hash usage took: 56.582 seconds 8376437/16777216 tied usage took: 210.352 seconds 8376437/16777216 Object usage took: 126.642 seconds` [download] show that using the object interface saves 1 microsecond per key, spread across all 5 operations performed on each key/value, Say 1/5th of a microsecond (0.0000002) per operation. You need to be doing billions of hash operations -- with almost no other associated processing -- for that to become a significant part of your time costs. With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday' Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error. "Science is about questioning the status quo. Questioning authority". In the absence of evidence, opinion is indistinguishable from prejudice. RIP Neil Armstrong	[reply] [d/l] [select]
Re^4: Tied Hashes vs. Objects by tkemmer (Initiate) on Oct 23, 2012 at 07:03 UTC
Re: Tied Hashes vs. Objects by sundialsvc4 (Abbot) on Oct 22, 2012 at 22:52 UTC
... and I, in like manner, have had some terrible experiences with “tied hashes” that were linked to Berkeley-DB files but that were treated as hashes. The performance was hideous: even though a real hash would not have had these problems, it was not a real hash. But the code that had been written to deal with them, acted as though it was. If there is any “performance penalty” associated with the object paradigm, and with the understanding/assumption that in your case milliseconds are not in fact all that precious, then I would advocate that the object syntax “wins” because it allows you to write code that reflects what your code actually means, and to do so in only one place. The implementation code occurs only once. In the case mentioned above, we found that we had literally hundreds of hot-spots that had to be addressed due to, as it was, a misplaced metaphor.	[reply]
Re^2: Tied Hashes vs. Objects by tkemmer (Initiate) on Oct 23, 2012 at 07:10 UTC
I completely agree with you that tied hashes are often misused, the Berkeley-DB interface being one of the most prominent examples. However, in my case the tied hashes really _are_ hashes, i.e. the internal representation would simply be a hash reference, and just some of the accessor methods would be overwritten. I mainly posted my question because usually I shy away from using tied hashes, for exactly the same reasons you give. I think a hash is a very poor interface for anything more complex than, let's say, a hash ;-)	[reply]
Re: Tied Hashes vs. Objects by tkemmer (Initiate) on Oct 22, 2012 at 17:26 UTC
My apologies for not unveiling my monk's name when I wrote this...	[reply]
Re: Tied Hashes vs. Objects by sundialsvc4 (Abbot) on Oct 22, 2012 at 18:10 UTC
If you need case-insensitive keys, just `lc()` the name and store the unaltered name as part of the value (which is itself a hash). An object wrapper for all of this is generally a fine idea because it encapsulates all of the gory details into one module so they won’t spread. Why give any second thought to “performance?” Next year’s hardware will always be faster. (This is not a performance-critical edge case.) If the software is unmaintainable over the course of several years, it will be counted vastly more expensive than the “iron.”	[reply]
Re^2: Tied Hashes vs. Objects by Anonymous Monk on Oct 22, 2012 at 18:35 UTC
Dear fellow monk, thanks for your advice. Rest assured, the actual implementation of these "special" hashes (case-insensitive keys, read-only hashes, substring keys, whatever they come up with) is of little concern here, and can -- and has actually mostly been -- dealed with. The issue about performance came up, because, well, these operations may be called A LOT. So if there is any significant difference betweeen tied hashes (which map hash operators to OO-style methods, basically) and calling these methods directly, I would really like to know.	[reply]


Clear questions and runnable code get the best and fastest answer
	PerlMonks