Beefy Boxes and Bandwidth Generously Provided by pair Networks
There's more than one way to do things
 
PerlMonks  

flattening Complex Datastructures

by Ctrl-z (Friar)
on Feb 10, 2003 at 18:06 UTC ( #234186=perlmeditation: print w/ replies, xml ) Need Help??

I love hashes!. But even more than I love hashes, I love Hashes of Hashes.

This meditation is somewhat related to A memory efficient hash, trading off speed - does it already exist?. In fact, on re-reading that node before posting this, I see aristotle briefly touched on this topic - so i guess im on old ground here.

The gist I took from that node, was pretty much 'In a HoH, unused pre-allocated space for each hashref makes for grotty memory consumtion in the overall structure'. Given a large enough HoH, significantly grotty.
There was a couple of solutions, a particularly interesting one from dragonchild here.

While recursively dumping my own large HoH out to screen, something occured to me - something Im not sure is rediculous or totally obvious.

Hashes consist of key/value pairs.
Hash keys need to be unique.
Similarly, a path through a HoH - from key to key to key to value - is equally unique. Like a Url.

So, (yup, you guessed it!) instead of filling hashref values with more hashrefs, how about simply creating a new key in the original hash, with the unique path that would have been in the HoH. So this

 

my $reportCard = { Johnny => { grade => "D", goals => "to be a fireman", comments => "nice but dim. Must try harder." }, Suzy => { grade => "B+", goals => "World Peace", comments => "Hard worker but lacking confidence" }, # ...etc } print $reportCard->{Johnny}->{comments};
becomes...
my %reportCard = ( Johnny.grade => "D", Johnny.goals => "to be a fireman", Johnny.comments => "nice but dim. Must try harder!", Suzy.grade => "B+", Suzy.goals => "World Peace", Suzy.comments => "Hard Worker but lacking confidence" ); print $reportCard{"Johnny.comments"}

Dot syntax at no extra cost ;)
From my limited understanding, a couple of things occured to me

(i) specific Lookups, Inserts and Deletes should be as fast as any hash (?).
(ii) straight key/value pairs are easy to store, slurp and pass around
(iii) using sort( keys() ) will automatically group related data together
(iv) Less wasted memory - but keeps that HoH sensation

..and some sober practicalities...

(i) I seem to recall reading that hashes can overflow?
(ii) grouping elements by a certain criteria is pretty nasty to implement
(iii) the full path in every key must have some sort of memory penalty of its own


Is this suitable for everyday use?
In some ways it seems pretty dumb, but in others, quite elegant...especially if you were to throw in an interface and some methods to translate to and from a real HoH

Can anyone expand on the circumstances where this approach might either pay off or fail completely?

Any opinions are appreciated




time was, I could move my arms like a bird and...

Comment on flattening Complex Datastructures
Select or Download Code
Re: flattening Complex Datastructures
by jdporter (Canon) on Feb 10, 2003 at 18:31 UTC
    You may be interested in the $; variable (aka the subscript separator), which was the perl4 way of achieving something like multidimensional hashes.

    jdporter
    The 6th Rule of Perl Club is -- There is no Rule #6.

Re: flattening Complex Datastructures
by FoxtrotUniform (Prior) on Feb 10, 2003 at 19:07 UTC

    Another issue I can see with this approach is that it's not terribly obvious. Not that one should write nothing but pedestrian code -- if this is the right thing to do, then do it -- but I can easily imagine some poor maintenance programmer looking at this code and thinking "WTF? Why didn't he just use a HoH?"

    Another point: specific operations will be O(1) instead of O(depth) if you know exactly where you want to go (i.e. you have a hash key already), but building the key may be O(depth), depending on how much information you have ahead of time. (And if speed's your concern, you can always snap the reference the first time you need it.)

    In general, I'm not sure that this technique gives you much of a win, but I'm not clever enough to prove that it'll never be useful. :-) Nice job, and thanks for making me think.

    --
    F o x t r o t U n i f o r m
    Found a typo in this node? /msg me
    The hell with paco, vote for Erudil!

Re: flattening Complex Datastructures
by mojotoad (Monsignor) on Feb 10, 2003 at 19:15 UTC
    I often keep the reference for one of the sub-hashes around for further queries and modifications. You can "pretend" to do this with your method by setting a key prefix before every query.

    My point, however, is that by keeping the reference to the sub-hash around for operations, the performance hit that you are concerned with goes away. You HoH and algorithm must be amicable to such an approach, however.

    Matt

Re: flattening Complex Datastructures
by demerphq (Chancellor) on Feb 10, 2003 at 19:36 UTC
    Another con is that extracting the keys() or values() for only one path will be extremely slow.

    Generally speaking I think the cons outweigh the advantages. If memory overhead really is an issue then there are other solutions available. For instance it is fairly common to use hashes as records/structs. When the records are well defined and unlikely to change then an array will do just as nicely, and allows faster look ups and is more efficient in terms of memory.

    I would say that if a program suffers from problems that would be alleviated by your proposal that there are other algortihmic and data structure changes that can be implemented, most likely with a net postive effect on the program even discounting memory issues.

    --- demerphq
    my friends call me, usually because I'm late....

Re: flattening Complex Datastructures
by bronto (Priest) on Feb 11, 2003 at 08:54 UTC

    Your approach is really good if, all in all, what you need is to always reach the leaves of the tree --your structure is tree-like in fact

    But, say you have a three-level hash. If you need to get an hash that lies in the middle, say /a/b your code soon gets complicated, while with real HoH you could simply write something like %b = %{$a->{b}}

    But if you are ready to pay the tradeoff of slowness and complicateness, you probabily want to write a "driver" object, maybe one that you could tie your hashes to: you keep working with HoH on the front-end, and get with a flat hash on the back-end. But I'm not a tie expert so I can't tell if it is possible at all :-)

    Cheers!
    --bronto


    The very nature of Perl to be like natural language--inconsistant and full of dwim and special cases--makes it impossible to know it all without simply memorizing the documentation (which is not complete or totally correct anyway).
    --John M. Dlugosz
      Your approach is really good if, all in all, what you need is to always reach the leaves of the tree
      Yep. How many times have we seen this question:
      I have a vector of keys, @keys = qw( foo bar none ); How can I easily get at the HoHoH element $h{foo}{bar}{none} ???
      If we use the perl4 approach to multidimensional hashes, instead of true nested hashes, it's simple:
      $h{@keys}
      ...one that you could tie your hashes to: you keep working with HoH on the front-end, and get with a flat hash on the back-end.
      Yes, that sounds very much like the Tie::Multidim module.

      jdporter
      The 6th Rule of Perl Club is -- There is no Rule #6.

        Ah!, now i get your original post. This is pretty cool.


        time was, I could move my arms like a bird and...
Re: flattening Complex Datastructures
by Ctrl-z (Friar) on Feb 11, 2003 at 17:32 UTC
    great responses, thanks.

    Im not too smart with the big O notation, but I did realise that grabbing subtrees or grouping by a specific node would slow right down. I suppose there must be algorithms and other comp.sci stuff that can minimize that?.

    In hindsight, i think what appeals the most about this, is that it flattens a HoH to single key/value pairs, but preserves the internal structure. Translating between these two structures seems like it could be handy for...umm, loads of stuff. Multi-level CGI params, dumping/slurping data-structures in EMCA-esque script...loads of stuff!.
    *cough*

    Well, I think curiosity is getting the better of me and Im gonna throw a module together and benchmark it. If anyone has ideas on improving the lookup deficiencies, or anything else, please feel free to contribute.



    time was, I could move my arms like a bird and...

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlmeditation [id://234186]
Approved by gmax
Front-paged by broquaint
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others contemplating the Monastery: (6)
As of 2014-10-01 06:45 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    How do you remember the number of days in each month?











    Results (389 votes), past polls