Beefy Boxes and Bandwidth Generously Provided by pair Networks
Pathologically Eclectic Rubbish Lister
 
PerlMonks  

Help with Hash of hashes, is there a better way?

by TeraMarv (Beadle)
on Jun 01, 2006 at 07:28 UTC ( #552989=perlquestion: print w/ replies, xml ) Need Help??
TeraMarv has asked for the wisdom of the Perl Monks concerning the following question:

G'day Monks,

I have been writing a script to monitor the capacities of various disks and databases in my work environment. During this process i have found myself working with the following Hash of Hashes

$stuff = { production => { windows => { Hostname => { targetname1 = { total_capacity = value, free_capacity = value }, targetname2 = { total_capacity = value, free_capacity = value } } }, unix => { Hostname => { targetname1 = { total_capacity = value, free_capacity = value }, targetname2 = { total_capacity = value, free_capacity = value } } }, database => { Hostname => { targetname1 = { total_capacity = value, free_capacity = value }, targetname2 = { total_capacity = value, free_capacity = value } } } }, development => { windows => { etc. etc.}, unix => { etc. etc.}, database => { etc. etc.} } } ;

While this perfectly describes the information i'm working with it becomes a bit unwieldy when working with it. I have had to resort to a bunch of nested 'while' loops

while(my($env,$platform_href) = each %{$stuff}) { while(my($platform,$host_href) = each %{$platform_href}) { while(my($host,$target_href) = each %{$host_href}) { while(my($target,$capacity_href) = each %{$target_href}) { # Do stuff with # $env # $platform # $host # $target # $capacity_href->{'total_capacity'}; # $capacity_href->{'free_capacity'}; } } } }

Nasty eh?

Is there a better/prettier way of traversing the data structure or even a better way of representing the data with a different structure?

Many Thanks,

TeraMarv

Comment on Help with Hash of hashes, is there a better way?
Select or Download Code
Re: Help with Hash of hashes, is there a better way?
by wfsp (Abbot) on Jun 01, 2006 at 07:59 UTC
    Anything that:
    ...perfectly describes the information i'm working with...
    is not
    Nasty
    :-)

    Also, I wouldn't call four while loops to traverse a complex data structure unweildy but ymmv.

    You have chosen, imo, good meaningful field/var names and I think this produces readable code. It's very easy to see what you are doing.

    If the amount of data is large and you need to build/load/store this structure frequently I might consider a relational db, say, MySQL. You would have a nice collection of left join tables.

    Perl is often lauded for it's regexes. I would argue that the ability to easily handle these types of complex data structures comes in at a close second.

    update:
    Fixed some typos/grammer

      wfsp,

      I suppose i'm not happy with the nested loops as i have to traverse the structure several times in the code and that requires using the same bit of code again and again. Any bit of code that gets used more than once is crying out for a function to be written. Unfortunately either my skills,imagination or experience isn't quite up to it.

      Any ideas?

        Some thoughts.

        • I could live with 'several' and not worry too much. :-)
        • You could do everything in one pass.
        • You could consider putting your traversing loops into a sub and pass a sub ref for the particular work that needs to be done.

        HTH

        that requires using the same bit of code again and again.

        Whenever I see this phrase, deafening alarm bells and sirens go off in my head. In the vast majority of cases, any time you're using the same (or similar) chunk of code over and over again, you can probably do better with a subroutine. In the case of a sub for traversing a complex data structure, you may want to use a callback hook to provide the specific functionality.

        Option 1)

        You could build an iterator, seperating the traversal from the client as demonstrated by the following:

        { my $iter = get_target_iter($stuff); while (my ( $env_name, $platform_name, $host_name, $target_name, $total_capacity, $free_capacity, ) = $iter->()) { local $, = "\t"; local $\ = "\n"; print $env_name, $platform_name, $host_name, $target_name, $total_capacity, $free_capacity; } }

        Tested.

        Update: Below is an alternative iterator. It's a drop-in replacement for the above function. This version is much smaller thanks to Algorithm::Loops's NestedLoops.

        Tested.

        Option 2)

        A callback would be simpler:

        { iterate { my ( $env_name, $platform_name, $host_name, $target_name, $total_capacity, $free_capacity, ) = @_; local $, = "\t"; local $\ = "\n"; print $env_name, $platform_name, $host_name, $target_name, $total_capacity, $free_capacity; } $stuff; }

        Tested.

Re: Help with Hash of hashes, is there a better way?
by BrowserUk (Pope) on Jun 01, 2006 at 09:15 UTC

    There is an example in the pod for Data::Rmap that shows how to use it's rmap_to() function to traverse a structure maintaining local state (the path through the hashes). It could be used as a starting point to create a custom iterator function.

    # Traverse a tree using localize state $tree = [ one => two => [ three_one => three_two => [ three_three_one => ], three_four => ], four => [ [ five_one_one => ], ], ]; @path = ('q'); rmap_to { if(ref $_) { local(@path) = (@path, 1); # ARRAY adds a new level to the pa +th $_[0]->recurse(); # does stuff within local(@path)'s scope } else { print join('.', @path), " = $_ \n"; # show the scalar's path } $path[-1]++; # bump last element (even when it was an aref) } ARRAY|VALUE, $tree; # OUTPUT # q.1 = one # q.2 = two # q.3.1 = three_one # q.3.2 = three_two # q.3.3.1 = three_three_one # q.3.4 = three_four # q.4 = four # q.5.1.1 = five_one_one

    This would involve writing your own function that wrapped code something similar to the above, but localise the keys on the fly. You would pass a callback (function or block) to the wrapper function, and it would call your code, with the appropriate variables ($env, $platform, $host, $target etc.) set and localised. You write this once the call with different callbacks each time you need to iterate the structure. If this idea interests you, but you need a bit more info on implementing it /msg me.

    The other thought that crossed my mind was if you only ever access this structure through iteration, rather than individual direct accesses, then a HoH is probably the wrong structure. An AoH would be easier to use in that case. It might look something like this (pseudo-code):

    my @servers = ( { type => production | development, env => Windows | unix | Database, hostname => the hostname targets => [ name1 => [ #total, #free ]. name2 => [ #total, #free ], ... ], }, { type => production | development, env => Windows | unix | Database, hostname => the hostname targets => [ name1 => [ #total, #free ]. name2 => [ #total, #free ], ... ], }, ... );

    And to iterate it:

    use constant { TOTAL => 0, FREE => 1 }; for my @server ( @servers ) { printf "Server: %s type:%s Env: %s\n", $server->{ hostname }, $server->{ type }, $server->{ env }; for my $target ( @{ $server->{ targets } } ) { printf "\tTotal: %d Free: %d\n", @{ $target }[ TOTAL, FREE ]; } }

    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    Lingua non convalesco, consenesco et abolesco. -- Rule 1 has a caveat! -- Who broke the cabal?
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.
Re: Help with Hash of hashes, is there a better way?
by neniro (Priest) on Jun 01, 2006 at 09:17 UTC
    You could use Objects instead of plain hashes. A few weeks ago i wrote an example for someone with a similar problem. It uses Moose for OO, and you can see how it simplifies the stringification of a nested structure:
    #!/usr/bin/perl use strict; use warnings; use Data::Dumper; package SingleState; use Moose; has 'state' => (isa => 'Value', is => 'rw'); has 'value' => (isa => 'Value', is => 'rw'); has 'position' => (is => 'rw'); sub to_s { my $self = shift; return $self->state . "\t=>\t" . $self->value . "\n"; } package Stat; use Moose; has 'states' => (isa => 'ArrayRef', is => 'rw'); has 'title' => (is => 'rw'); has 'position' => (is => 'rw'); sub add_state { my $self = shift; return unless @_; push @{$self->{states}}, $_ for @_; } sub to_s { my $self = shift; my $o = $self->title ? "== " . $self->title . " ==\n" : ''; $o .= $_->to_s for (sort { $a->position <=> $b->position } @{$self +->states}); return $o; } package StatSet; use Moose; use Data::Dumper; has 'stats' => (isa => 'ArrayRef', is => 'rw'); sub add_stat { my $self = shift; return unless @_; push @{$self->{stats}}, $_ for @_; } sub report { my $self = shift; my $s; $s .= $_->to_s . "\n" for (sort { $a->position <=> $b->position } +@{$self->stats}); return $s; } package main; my $set = new StatSet; my $stat1 = new Stat ( title => 'CPU' ); my $stat2 = new Stat ( title => 'Memory' ); my $state1 = new SingleState( state => 'Idle', value => '92', position + => 1 ); my $state2 = new SingleState( state => 'User', value => '0', position + => 2 ); my $state3 = new SingleState( state => 'Cached', value => '531976', po +sition => 1 ); $stat1->add_state( $state1, $state2 ); $stat1->position(1); $stat2->add_state( $state3 ); $stat2->position(2); $set->add_stat($stat1); $set->add_stat($stat2); print $set->report;
Re: Help with Hash of hashes, is there a better way?
by roboticus (Canon) on Jun 01, 2006 at 11:02 UTC
    TeraMarv:

    Since you're concerned about the effort of traversing the data structure and asked for alternative data structures, I have a couple of thoughts.

    While it's nice to have a perfect representation of your system, it may be that your code doesn't really do anything fancy with $env, $platform, etc., and only need them in a couple of edge cases. In that case, you may want to flatten your data structure by combining the keys, allowing you to split out the parts you want if/when you need them. Something like the following (contrived) example where we need a disk usage warning report:

    # Assumes top hash key is in the form: # hostname:platform:env # and the second-level hash key is: # target # and since the lowest level of your example is always the # same structure, we put the values in an array: # [0] = total_capacity # [1] = current_free_space while (my ($hostkey,$targets_href) = each %{$stuff}) { # *This* report doesn't care about prod/dev or OS, it's # just to warn about disk space problems, so only need # $host... Other reports may want other bits my ($host,undef) = split /:/,$hostkey; print "\n*****\n* $host\n*****\n\n" . "FreeSpc %Free Target\n" . "-------- ----- -----------------\n"; while (my ($target,$cap_href) = each %{targets_href}) { my $pct_free = $cap_href[1] * 100.0 / $cap_href[0]; if ($cap_href[1] < $warn_pct * $cap_href[0]) { printf "% 8u %5.3f %s\n", $cap_href[1], $pct_free, $target; } } }
    --roboticus
Re: Help with Hash of hashes, is there a better way?
by davidrw (Prior) on Jun 01, 2006 at 12:43 UTC
    This is kind of similar to roboticus's post, and also in nature to wfsp's SQL suggestion (because this is how the data would end up coming out of it.

    You could do the nested loop-age just once, and build an array of hashrefs that each have all the information. Then just loop through that whenever needed.
    my @targets; while(my($env,$platform_href) = each %{$stuff}) { while(my($platform,$host_href) = each %{$platform_href}) { while(my($host,$target_href) = each %{$host_href}) { while(my($target,$capacity_href) = each %{$target_href}) { push @targets, { env => $env, platform => $platform, host => $host, target => $target, total_capacity => $capacity_href->{'total_capacity'} +, free_capacity => $capacity_href->{'free_capacity'}, }; } } } } # now, whereever you previously had the 4 nested loops, just do: foreach my $target ( @targets ){ # Do stuff with these keys of %$target: # env # platform # host # target # total_capacity # free_capacity }
Re: Help with Hash of hashes, is there a better way?
by bsb (Priest) on Jun 02, 2006 at 06:41 UTC
    DBD::SQLite is worth considering.

    The querying is done using sql, so is very flexible, and you can use other DBI and SQL tools to inspect the data. SQLite is also very easy to use and so cuts out the admin overhead of a full DBMS

    Brad

Re: Help with Hash of hashes, is there a better way?
by dimar (Curate) on Jun 02, 2006 at 15:21 UTC

    As a general rule, I always use a flattened simple "Table" structure (AoH ref or alternatively an AoA ref) with denormalized data. Nested structures like the one you depicted are usually overkill unless there is some elaborate OOP going on behind the scenes. Sometimes I will use an (HoAoH ref) and call that a "Workbook", but rarely does the need to re-invent a data structure go beyond that unless it is intrinsic to code optimization or other ancillary design constraints.

    (Note: subject to personal preference, ymmv, the following are instructive guidelines, but not laws.)

    Rationale:

    • Everyone knows what a 'table' is, it's the "Hello World" of information architecture.
    • Just as every recursive routine can be expressed iteratively, so too can any nested data structure be mapped to zero or more related tables. Therefore, it is ontologically complete.
    • If you had "fun" devising 'just the right datastructure' that's a red flag that you over-engineered it.
    • When the inevitable change happens to your 'just right' data structure, adding or removing elements will be a major hassle, and will introduce breaking changes, unless the data structure is orthogonal to its semantic content.
    • You automatically get easy transfer between various reporting and querying apps, should the need arise (e.g., DBD::SQlite, DBD::AnyData, Template::Toolkit).
    • People can figure out what your code is doing just by looking at it.
    • The best datastructure is usually one that you don't have to think about much when you use it, just like you shouldn't have to use a special kind of glass to drink water instead of milk or juice.

    Using this approach, any new 'data model' I have to work with usually consists of only four simple, reliable and repeatable steps: (define, populate, filter and process).

    ### define my $oTable000 = []; my $oRow = { 'typeof' => 'production', 'platform' => 'windows', 'foohost' => 'hostname', 'footarget' => 'targetname1', 'total_capacity' => 0, 'free_capacity' => 0, }; ### populate (with fake data) for (0 .. 12){ $oRow->{platform} = ${[qw(windows unix database)]}[int(rand(3))] +; $oRow->{typeof} = ${[qw(production development)]}[int(rand(2)) +]; my %hCurrent = %{$oRow}; $oTable000->[$_] = \%hCurrent; }; ### filter (do whatever querying or grouping you want here) @{$oTable000} = ### SORT BY sort { $a->{platform} cmp $b->{platform}} ### WHERE typeof = 'development' grep { $_->{typeof} eq 'development';} @{$oTable000}; ### process ### (send it off to your template engine, number crunch, whatever) foreach my $oRec (@{$oTable000}){ DoStuff($oRec); };
    =oQDlNWYsBHI5JXZ2VGIulGIlJXYgQkUPxEIlhGdgY2bgMXZ5VGIlhGV

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://552989]
Approved by wfsp
Front-paged by planetscape
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others studying the Monastery: (10)
As of 2014-07-22 14:25 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    My favorite superfluous repetitious redundant duplicative phrase is:









    Results (115 votes), past polls