http://www.perlmonks.org?node_id=570409

gryphon has asked for the wisdom of the Perl Monks concerning the following question:

Greetings fellow monks,

After looking over a recent project, I noticed I had written similar code in several sections that effectively did the same thing: Grab data from a database (single query) and build a tree structure from that data. Each time, I was building a structure according to similar rules, so I decided to make a common function.

What I've come up with works, but it has some repeated code. But what concerns me the most is that it's more or less a foreach loop that pushes to an array... which makes me want to rewrite it as a map. Anyway, I've been looking at this for a while and am stuck. I'd like some pointers as to how I could improve the function.

sub tree_data { my ( $old_data, $key_name, $next_group, @remaining_keys ) = @_; my ( @new_data, $last_key, $build_box ); foreach ( @{ $old_data } ) { if ( $last_key and ( $last_key ne $_->{ $key_name } ) ) { push @new_data, { $key_name => $last_key, $next_group => ( @remaining_keys ) ? tree_data( $build_box, @remaining_keys ) : $build_box, }; undef $build_box; } $last_key = delete $_->{ $key_name }; push @{ $build_box }, $_; } push @new_data, { $key_name => $last_key, $next_group => ( @remaining_keys ) ? tree_data( $build_box, @remaining_keys ) : $build_box, }; return \@new_data; }

What I'm trying to accomplish here is take data from a database call and build a tree from it based on key names. Here's some sample input:

my $input = [ { 'team' => 'A-Team', 'employee' => 'Face', 'work_day' => '2006-08-28', 'other_data' => '123456789', }, { 'team' => 'A-Team', 'employee' => 'Murdock', 'work_day' => '2006-08-28', 'other_data' => '123456789', }, { 'team' => 'Military', 'employee' => 'Decker', 'work_day' => '2006-08-28', 'other_data' => '123456789', }, ];

Here's some sample output:

my $output = [ { 'team' => 'A-Team', 'employees' => [ { 'employee' => 'Face', 'work_day' => '2006-08-28', 'other_data' => '123456789', }, { 'employee' => 'Murdock', 'work_day' => '2006-08-28', 'other_data' => '123456789', }, ], }, { 'team' => 'Military', 'employees' => [ { 'employee' => 'Decker', 'work_day' => '2006-08-28', 'other_data' => '123456789', }, ], }, ];

To call the function, I just pass in the data and the key name and sub-group name pairs I want, like this:

# ...to generate the output example from above... my $output = tree_data( $input, 'team', 'employees' ); # ...to go a little deeper... my $output = tree_data( $input, 'team', 'employees', 'employee', 'work_days' );

Going forward, I'm probably going to want to add the ability to pull multiple keys into a particular level, not just a single key; but that's really out of scope for what I'm trying to accomplish right now.

What sort of things should I do to make this function better? Thanks in advance for your suggestions.

gryphon
Whitepages.com Development Manager (WDDC)
code('Perl') || die;

Replies are listed 'Best First'.
Re: Build tree data structure from DB (flat) data; function golf
by halley (Prior) on Aug 30, 2006 at 19:56 UTC
    Of the notable virtues of all good programmers, cleverness isn't one.

    You're refactoring a common routine out so that it's more generalized and useful. You want to standardize to avoid introducing cut-n-paste bugs on the next opportunity to make a similar tree structure. Thus, it's more important to be clear and obvious than to be compact.

    If there's ever a specific reason to adopt a more compact and clever form of an algorithm, it should be to take advantage of some efficiency trick, turning a very slow O(n) into a very fast O(n). And explain what you're leveraging clearly. I see no particular efficiency worries here.

    --
    [ e d @ h a l l e y . c c ]

Re: Build tree data structure from DB (flat) data; function golf
by stvn (Monsignor) on Aug 30, 2006 at 19:25 UTC

    Well I didn't convert the foreach to a map, but I did remove some of your repeated code (closures++), and got rid of some checks and the a tiernary.

    One note, though, this is a destructive function. When I tested the deeper case, it did not work because the original data structure ($old_data) is destroyed during the building of the tree. This is kind of a gotcha.

    sub tree_data { my ( $old_data, $key_name, $next_group, @remaining_keys ) = @_; # guard against bad input and serve as recursion end case return $old_data unless defined $key_name; my @new_data; my $last_key = ''; # if this is initialized, you dont have to check + below my $build_box = []; # initalize this explicity (more self documentin +g) # encapsulate code into inner sub # fortunately for us, all variables # are closed over too :) my $push_new_data = sub { return unless $last_key; push @new_data, { $key_name => $last_key, $next_group => tree_data( $build_box, @remaining_keys ) }; # re-init this $build_box = []; }; foreach ( @{ $old_data } ) { if ( $last_key ne $_->{ $key_name } ) { $push_new_data->(); } $last_key = delete $_->{ $key_name }; push @{ $build_box }, $_; } $push_new_data->(); return \@new_data; }

    Here is a complete test script which compares against the old one.

    -stvn

      Greetings stvn,

      This is awesome. Thanks! I need to reprogram my brain so I'll think to use closures more readily. Maybe I need to re-read HOP. You make a great point about this being a destructive function. I think it's easy to fix, though:

      $last_key = delete $_->{ $key_name }; push @{ $build_box }, $_;

      ...becomes...

      push @{ $build_box }, $_; $last_key = delete $build_box->[-1]{$key_name};

      UPDATE: I'm a crazy idiot. This doesn't do what I claim it should do. Bad programmer. No cookie. All I'm doing is moving the reference. I need to deep copy instead.

      gryphon
      Whitepages.com Development Manager (WDDC)
      code('Perl') || die;

Re: Build tree data structure from DB (flat) data; function golf
by NamfFohyr (Scribe) on Aug 30, 2006 at 17:50 UTC
    Disclaimer: noobish advice

    I don't understand why it concerns you that the foreach pushes to an array. It's scoped within the subroutine so I don't think it's particularly inefficient.

    Because you push twice to the output array, a map is difficult to pull off while preserving the overall flow of information. This probably occcurred to you.

    The remaining duplicated code seems natural to me to leave as-is. Wrapping it up in a subroutine seems like overkill; you weren't considering THAT, were you?

    I think the function is stunning, BTW. I'm still wrapping my head around it. One may be able to make its workings more obvious, but I can't help you, as I barely follow how it works. Perhaps my self-declared confusion will prompt some of the Enlightened to elaborate.

    ry

    "I'm not afraid of Al Quaeda. I'm afraid of Al Cracker." -Chris Rock
Re: Build tree data structure from DB (flat) data; function golf
by jdporter (Paladin) on Aug 31, 2006 at 19:07 UTC

    I find it most convenient to convert to an intermediate format: a hash, keyed by team name.

    my %ident_key = ( from => 'team', to => 'team' ); my %recs_key = ( from => 'employee', to => 'employees' ); my %h; # intermediate hash for ( @$input ) { my %r = %$_; my $k = delete $r{$ident_key{'from'}}; push @{ $h{$k} }, \%r; } # convert to your desired structure: my @output = map { { $ident_key{'to'} => $_, $recs_key{'to'} => $h{$_}, } } keys %h;

    If you need the input order to be preserved, you could use something like Tie-IxHash.

    We're building the house of the future together.
      I was going to suggest a change in data structure as well.
      my $input = [ { 'team' => 'A-Team', 'employee' => 'Face', 'work_day' => '2006-08-28', 'other_data' => '123456789', }, { 'team' => 'A-Team', 'employee' => 'Murdock', 'work_day' => '2006-08-28', 'other_data' => '123456789', }, { 'team' => 'Military', 'employee' => 'Decker', 'work_day' => '2006-08-28', 'other_data' => '123456789', }, ];
      To:
      my %input = ( 'A-Team' => { { 'employee' => 'Face', 'work_day' => '2006-08-28', 'other_data' => '123456789', }, { 'employee' => 'Murdock', 'work_day' => '2006-08-28', 'other_data' => '123456789', }, }, 'Military' => { { 'employee' => 'Decker', 'work_day' => '2006-08-28', 'other_data' => '123456789', }, }, );
        The structure's ideal format depends on how it will be used

        gryphon's output seems like something intended for HTML::Template, which only accepts scalars or arrays of hashrefs unfortunately. It wouldn't be able to iterate over your proposed structure.

Re: Build tree data structure from DB (flat) data; function golf
by imp (Priest) on Sep 02, 2006 at 05:33 UTC
    I like the closure approach suggested by stvn. It comes with a small performance penalty because of all the function calls (7-20% depending on the dataset and spread), but it makes the code much easier to read/maintain.

    On a related note - it would be good to add some documentation that describes your strategy. It isn't immediately apparent, and I had to re-read the code a few times before it was clear.

    Found one problem: the following line should test whether $last_key is defined, to avoid problems when your data contains an empty string or '0'.

    if ( $last_key and ( $last_key ne $_->{ $key_name } ) ) {