packetstormer has asked for the wisdom of the Perl Monks concerning the following question:


Can anyone tell me if there is any easy way to access (drill down) to data contained in hashes of hashes and/or hashes or arrays of hashes etc? I have been using Data::Dumper, which is great but it is still difficult to get the syntax correct to pull out the elements needed

Is there a handy perl trick I am missing? It is taking me ages to get the simplest string!

Replies are listed 'Best First'.
Re: Access Hashes of Hashes etc.
by DrHyde (Prior) on Sep 14, 2011 at 09:32 UTC
    executes the coderef stored at element four in the arrayref stored under key 'wobble' in the hashref stored under key 'wibble' in the hash ref stored at element three in the arrayref under key 'foo' in the hashref in $thingy.
Re: Access Hashes of Hashes etc.
by zentara (Archbishop) on Sep 14, 2011 at 09:54 UTC
Re: Access Hashes of Hashes etc.
by moritz (Cardinal) on Sep 14, 2011 at 10:30 UTC
      Thanks all - more than enough information there for me to get lost in!!
Re: Access Hashes of Hashes etc.
by chrestomanci (Priest) on Sep 14, 2011 at 10:51 UTC

    One trick you can use is to test things interactively with the perl debugger. When faced with a complex and potentially confusing deeply-nested data structure (especially one provided by a third party library), I often run the code in a debugger, and set a breakpoint on a line where the data structure is in scope. I then interactively attempt to read it until I have found the correct syntax to get down to the part I want. I then paste that bit of syntax into my editor.

    That way if I get confused and write $thingy->{foo}->[3]->{wobble}->[4]->() when I should have written $thingy->{foo}->[3]->{wibble}->{wobble}->[4]->() The perl debugger will tell me, and I can adjust things until you I it right. This is much better than the program crashing and I have to keep re-running it with different syntaxes until it is correct.

    I find this technique is especially useful when used with third party libraries that I did not write that return objects with access methods to get things out. Often if you can't remember the name of the access method you need, you can guess as it will often have the same name as a likely looking key in blessed hash you get back.

    One trick to remember when examining large deeply-nested data structures, especially anything that contains doubly linked lists or trees is to limit the depth when you use x $objRef in the debugger. For example when working with DBIx::Class result objects, I usually do x 3 $rowObj and get about 20 lines of output. If not I get about 1000 lines of output filling my screen.

Re: Access Hashes of Hashes etc.
by Utilitarian (Vicar) on Sep 14, 2011 at 09:39 UTC
    Is there a handy perl trick I am missing?
    'fraid not, it's just a matter of practice, given time you'll be able to read/write accessors to deeply nested variables easily, in the interim, run your finger along the screen writing down what you've seen with the other hand ;)

    print "Good ",qw(night morning afternoon evening)[(localtime)[2]/6]," fellow monks."
      I don't know if this is exactly what the OP is looking for, but if it is, as a newbie I find it easier to know how to dereference these structures once I literally draw myself a map on paper, writing down what kind of information goes where in the structure, etc.

      Hope it helps.
Re: Access Hashes of Hashes etc.
by Anonymous Monk on Sep 14, 2011 at 09:53 UTC
Re: Access Hashes of Hashes etc.
by muba (Priest) on Sep 14, 2011 at 13:30 UTC

    Usually, when working with such data structures, you're only interested at one node at a time. And when that's the case I find it pretty useful to keep that node in its own variable, so you don't have to bother with the whole structure all the time.

    For example, suppose I keep track of the way visistors use my web app from one day to another. To do this, I keep user statistics in an array, where each element represents a day. The last element in the list is yesterday, element -2 is the day before yesterday, et cetera.

    Each element is itself a hashref, with at least the key 'username' given. Another key include 'visited_pages', which value is an AoH with information about the pages this particular user visited such as page title and an array of other users who also visited that page on that day.

    Suppose I'm tasked to see what pages a certain user, named John Doe, had visited a week ago, and who else has visited the same pages on that day.

    my $webapp_usage_statistics = pull_this_data_from_somewhere(); my $johndoe = grep {$_->{username eq "John Doe"}} @{$webapp_usage_stat +istics->[-7]} for my $page (@{$johndoe->{pages}}) { print "$johndoe->{He_or_She} visited $page->{title}\n"; my $others = join(", ", map {$_->{username}} @{$page->{others}}); print " as did $others.\n\n"; }

    You see what I did there. Already in line 2 I extract some data from $webapp_usage_statistics and then never bother wit $webapp_usage_statistics again, because I know I'm only interested in that one node of seven days ago where John Doe is the user name.

    And I go on like that: the moment I begin looping over John's visited pages of a week ago, I've made a variable that keeps a reference to one page at a time, so I can pull the information from it that I want, without constantly saying $johndoe->{pages}->[$idx]->{title}. Nope, just $page->{title}. And when it comes to showing the other users that visited that page that day I do the same trick. I make map loop over the dereferenced array of users so that I can just say $_->{username} from within the BLOCK. That really beats $webapp_usage_statistics->[-7]->[$some_index]->{pages}->[$some_other_index]->{others}->{username}, which is really what it boils down to.

    That code's just there to illustrate a point. It might contain typos, bugs, or oddities. Also, I don't think that writing a webapp usage tracker in this fashion is really a good idea because the data structure is... awkward at best.

      I often find it convenient to have a routine which claims to process all the records, which actually goes through the top-level keys and invokes another routine to processing the corresponding sub-nodes.

      I prefer to have short subroutines in any case, under 20 lines if at all possible, so this makes it easier to understand what each routine does. It's not like the 80s, when invoking a subroutine was drastically slower than doing things inline.

      It does generate a problem of coming up with names for several levels of subroutine: process_all_names(), process_one_name(), process_first_name(), process_last_name(), ...

      As Occam said: Entia non sunt multiplicanda praeter necessitatem.

        I know what you mean. I'm currently in the middle of writing a JSON parser/generator, and I have these subroutines with lovely names such as _parser__handle_map, _parser__handle_array, _parser__handle_string, _parser__next_token_type, _parser__next_value, _to_json__format_hash, _to_json__format_array, and _to_json__format_string.

        Pretty straightforward names, and it positively beats those unruly if/elsif/.../else constructions.

        Basically a strategy like that does the same thing as I suggested but even takes it a step further. Instead of pulling data from the greater structure into a variable until you're at the deepest level you care about, you pull the data from the structure into a subroutine until you've got what you really wanted.

Re: Access Hashes of Hashes etc.
by 1arryb (Acolyte) on Sep 14, 2011 at 13:43 UTC


    DrHyde's answer is correct, if elliptical. The key concept is that hashes can only contain scalar values. That means the only way to have a hash inside a hash is to use a reference for the inner hash. And as long as you're doing that, you might as well use a reference for the outer hash itself to make the syntax more regular ($thingy->{foo}->[3] vs. $thingy{foo}->[3]).

      And as long as you're doing that, you might as well use a reference for the outer hash itself to make the syntax more regular ($thingy->{foo}->3 vs. $thingy{foo}->3).

      Well, except the internal ->'s aren't necessary anyway, so $thingy{foo}[3] works just fine.

        True, of course, but in my opinion using the dereference operator -> explicitely anyway makes it clearer what your intentions are. $thingy{foo}[3] might work when %thingy is your root hash, so to speak. But once you're deeper into the structure, where you've loaded stuff into variables such as $thingies_a_thingy_does_in_the_weekends which would be an arrayref or hashref, you're either bound to use -> or dereference the whole thing anyway.

        That, and there's nothing wrong with littering your code with ->.

        Sometimes being explicit makes things all the more readable. For example, let's assume a function superFunc, that returns different things based on whether its called in scalar context or list context. In my $superScalar = scalar(superFunc) it's already clear that you want the scalar context behaviour from the my $superScalar part. But by calling it as scalar(superFunc) you say: "yes, I know that superFunc is context sensitive and yes, I really want its scalar context behaviour."

        But I digress.