Hierarchial data structures

Q3Man has asked for the wisdom of the Perl Monks concerning the following question:

I'm rewriting a large section of code and I would appreciate any impartment of wisdom from the Perl Monks on how to best design a data structure which can match the requirements necessary and be future-proofed enough that someone not familiar with the code base can add to it without tripping over their feet.

The data structure needs to be hierarchical and support both hashes and arrays. A quick example of what is needed is:

$galaxy{solarsystem}{planet}[year]{lat}{long}=’foo’;

Now, a simple hash of hashes of hashes… takes care of 95% of the requirements. The problem arises when I want to introduce an array as a parent of a hash. I know I can make a hash (of hashes of hashes) and put a reference to it in the array and be done with it. What I would like to do is eliminate some of that complexity so this data structure can be added to without that additional effort. i.e. the above assignment will fail due to pseudo-hashes being depreciated. (Note: they fail because year is an array. If year is a hash, everything is golden.)

An option that I could use is to simply shorten the depth of a majority of the tree such as:

$galaxy{solarsystem.planet}[year]{lat.long};

But I’m not sure if the added complexity in referencing inside foreach and similar loops is worth any speed improvement. I could combine this with converting the year to a hash and end up with a huge single level hash ($galaxy{solarsystem.planet.year.lat.long}) but that kind of defeats the purpose I’m trying to accomplish in the first place.

So.. I’m looking at what seem to amount to three options. First is to go with straight hashes and make sure array indexes are properly handled (which involves rewriting a non-trivial amount of code, but it is code that will probably need to be rewritten anyway). My second option is to implement the data structures ‘properly’ which may cause some confusion to non-programmers. My third option is to look around for or write a module that can load multi-dimensional hierarchal data from a configuration file (probably exists somewhere).

In order of importance, I want to code this to be:

simple to understand and expand by non-programmers.
hierarchal so segments of the tree can be referenced to and processed by subs.
Syntaxically (is that a word?) valid through perl 5.10 (Pseudo-hashes were deprecated in Perl 5.8.0 and they will be removed in Perl 5.10.0, see perl58delta for more details.)

Anyone have any advice?

Comment on Hierarchial data structures Select or Download Code

Replies are listed 'Best First'.
Re: Hierarchial data structures by dave_the_m (Monsignor) on May 24, 2006 at 00:19 UTC
You are only getting the pseduo-hash warning because at one point you are creating an array reference whose first element is a hash, then later, treating that array as a hash, eg `$ perl588 -we'$h{a}[0]{b} = 1; $h{a}{x}{y}=2' Pseudo-hashes are deprecated at -e line 1. No such pseudo-hash field "x" at -e line 1. $` [download] if you treat it consistently as an array you won't get the warning. Other than that, I don't really understand your question. All I can really say is that you can easily have an arbitrary heirarchy of arrays and hashes. This: `$galaxy{solarsystem}{planet}[year]{lat}{long}=’foo’;` [download] should just Do the Right Thing, as long as you don't later start doing {year} Dave.	[reply] [d/l] [select]
Re: Hierarchial data structures by Zaxo (Archbishop) on May 24, 2006 at 03:14 UTC
Deep structures like that seem to shout out OO to me. If data is hierarchial enough to warrant a deep structure, there must exist an organizing principle that can supply a class name for instances of data at each level. The advantage is that that conceptually flattens the huge data tree. Your example data structure suggests exactly how that should go. Galaxies are gravitationally bound clusters of solar systems, rubble, dust and gas. Solar systems are gravitationally bound clusters of stars, planetary systems, rubble, dust and gas. Planetary systems are gravitationally bound clusters of planets and moons . . . with perhaps a little rubble, dust and gas. Of your goals, A non-programmer who knows even a little astrophysics will immediately understand what OO code based on that does. One who wants to expand it to superclusters or add a black hole to their galaxy's center of mass will have plenty of guidance. I wouldn't trust the coding details to a non-programmer, but they will understand well enough to provide specifications and realm advice. There is clearly a fundamental base class, "Massive Object", and a base collection object, "Gravitationally Bound Cluster", which is itself an even more Massive Object. The various subtrees are instances of these, subclassed by other properties into "Solar System", "Star", "Planet", "Rubble", and so on. The subs which may act on a part of the tree are simply the object methods of whatever occupies that slot. Many of those will be inherited from the base classes. In particular, the methods of Massive Object can probably be called by any element of the $galaxy tree. As long as you don't fiddle with pseudohashes in defining your classes, they won't bite. The OO principles used here are supported by any Perl 5. After Compline, Zaxo	[reply]
Re: Hierarchial data structures by BrowserUk (Patriarch) on May 24, 2006 at 06:54 UTC
The question that keeps niggling me is: Why do you want an array at that point? Arrays are indexed numerically, and 'year' is not numeric. And if 'year' is a placeholder for 1999, 2000, 2001 etc., then I still don't understand why you want an array. Array's start from 0 (ignoring $[ which is totally impractical), which unless your application needs to deal with antiquity, means that you'll be reserving (a relatively small amount) of space for almost 2000 entries in each 'date' array that you will never use. Remember, that's one array for every `'galaxy' * 'solar system' * 'planet'`; depending upon the size of your universe, that adds up to a lot of ~~dark matter~~ wasted space :) The advantages of arrays over hashes for numeric data are: They are slightly faster They use less memory If you are only using indexes 2000 .. 2020 of 0 .. 2020, then you will pretty much completely negate the latter, and the former is unlikely to make any appreciable difference given that you are already doing 5 other levels of hash lookup. Stick with a HoHoH(oH...)--it will make your life much simpler. You might look at YAML for your multi-dimensional configuration data. It seems to be the simplest/clearest format for manually maintained data. Finally, for contrast with Zaxo's advice--for the purpose of giving the ammunition to make up your own mind--compare iterating all the latitudes and longitudes for a given year and incrementing the number of occurances: Using a HoH: `my $thisYear = 2005; my $yearRef = $galaxy{ $sSystem }{ $planet }{ $thisYear }; for my $year ( keys %{ $yearRef } ) { for my $lat ( keys %{ $yearRef->{ $year } } ) { for my $long ( keys %{ $yearRef->{ $year }{ $lat } } ) { printf "$sSystem.$planet.$thisYear.$lat.$long:%s\n", $yearRef->{ $year }{ $lat }{ $long }; $yearRef->{ $year }{ $lat }{ $long }++; } } }` [download] Using OO: `my $thisYear = 2005; my $yearIter = $galaxy->getYearIerator( $sSystem, $planet, $thisYear ) +; while( my $year = $yearIter->() ) { my $latIter = $year->getLatitudeIter(); while( my $lat = $latIter->() ) { my $longIter = $lat->getLongIter(); while( my $long = $longIter->() ) { printf "$sSystem.$planet.$thisYear.%s.%s.%s\n", $year->getYear(), $lat->getLat(), $long->getLong(), $long->getOccurances(); $long->setOccurances( $long->getOccurances() + 1 ); } } }` [download] Your choice ;) Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error. Lingua non convalesco, consenesco et abolesco. -- Rule 1 has a caveat! -- Who broke the cabal? "Science is about questioning the status quo. Questioning authority". In the absence of evidence, opinion is indistinguishable from prejudice.	[reply] [d/l] [select]
Re^2: Hierarchial data structures by Zaxo (Archbishop) on May 24, 2006 at 16:34 UTC
I think that year, latitude, and longitude would wind up being parameters of the local coordinate system, hence either free variables or getters/setters. ;-) After Compline, Zaxo	[reply]
Re^3: Hierarchial data structures by BrowserUk (Patriarch) on May 24, 2006 at 16:52 UTC
My comments were somewhat tongue in cheek , but I would love to see an example of what you mean? There would still need to be storage to hold whatever value are associated with each time/lat/long combination, and Perl's object system seems particularly inadaquate for dealing with this kind of aggregate data. Both Javascript and VB (from what little I know of it), seem to have a consistant syntax for this kind of thing, but perl's is particularly clumsy. Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error. Lingua non convalesco, consenesco et abolesco. -- Rule 1 has a caveat! -- Who broke the cabal? "Science is about questioning the status quo. Questioning authority". In the absence of evidence, opinion is indistinguishable from prejudice.	[reply]
Re: Hierarchial data structures by NetWallah (Canon) on May 24, 2006 at 04:54 UTC
To me, your requirements read like an organized structure holding massive amounts of data, that should e easy to access via multiple axes .. a Database ! A relational DB with an arbitrary number of linked tables, with multiple indices should meet your needs - thery are plenty of free high-performance DB's, and you end up with an easily extensible, robust solution, which just happens to be perl-version independent. On top of the database, it is possible to build objects, using modules like Class::DBI, and work towards incorporating Zaxo's(++) OO ideas on top of DB storage. "For every complex problem, there is a simple answer ... and it is wrong." --H.L. Mencken	[reply]
Re: Hierarchial data structures by EdwardG (Vicar) on May 24, 2006 at 08:02 UTC
...how to best design a data structure which can match the requirements necessary and be future-proofed enough Excellent question! Fortunately, there is also an excellent answer! The relational view (or model) of data described in Section 1 appears to be superior in several respects to the graph or network model [3, 4] presently in vogue for non-inferential systems. It provides a means of describing data with its natural structure only -- that is, without superimposing any additional structure for machine representation poses. Accordingly, it provides a basis for a high level data language which will yield maximal independence between programs on the one hand and machine representation and organization of data on the other. Further reading	[reply]