Beefy Boxes and Bandwidth Generously Provided by pair Networks Cowboy Neal with Hat
Clear questions and runnable code
get the best and fastest answer
 
PerlMonks  

Build tree data structure from DB (flat) data; function golf

by gryphon (Abbot)
on Aug 30, 2006 at 15:32 UTC ( #570409=perlquestion: print w/ replies, xml ) Need Help??
gryphon has asked for the wisdom of the Perl Monks concerning the following question:

Greetings fellow monks,

After looking over a recent project, I noticed I had written similar code in several sections that effectively did the same thing: Grab data from a database (single query) and build a tree structure from that data. Each time, I was building a structure according to similar rules, so I decided to make a common function.

What I've come up with works, but it has some repeated code. But what concerns me the most is that it's more or less a foreach loop that pushes to an array... which makes me want to rewrite it as a map. Anyway, I've been looking at this for a while and am stuck. I'd like some pointers as to how I could improve the function.

sub tree_data { my ( $old_data, $key_name, $next_group, @remaining_keys ) = @_; my ( @new_data, $last_key, $build_box ); foreach ( @{ $old_data } ) { if ( $last_key and ( $last_key ne $_->{ $key_name } ) ) { push @new_data, { $key_name => $last_key, $next_group => ( @remaining_keys ) ? tree_data( $build_box, @remaining_keys ) : $build_box, }; undef $build_box; } $last_key = delete $_->{ $key_name }; push @{ $build_box }, $_; } push @new_data, { $key_name => $last_key, $next_group => ( @remaining_keys ) ? tree_data( $build_box, @remaining_keys ) : $build_box, }; return \@new_data; }

What I'm trying to accomplish here is take data from a database call and build a tree from it based on key names. Here's some sample input:

my $input = [ { 'team' => 'A-Team', 'employee' => 'Face', 'work_day' => '2006-08-28', 'other_data' => '123456789', }, { 'team' => 'A-Team', 'employee' => 'Murdock', 'work_day' => '2006-08-28', 'other_data' => '123456789', }, { 'team' => 'Military', 'employee' => 'Decker', 'work_day' => '2006-08-28', 'other_data' => '123456789', }, ];

Here's some sample output:

my $output = [ { 'team' => 'A-Team', 'employees' => [ { 'employee' => 'Face', 'work_day' => '2006-08-28', 'other_data' => '123456789', }, { 'employee' => 'Murdock', 'work_day' => '2006-08-28', 'other_data' => '123456789', }, ], }, { 'team' => 'Military', 'employees' => [ { 'employee' => 'Decker', 'work_day' => '2006-08-28', 'other_data' => '123456789', }, ], }, ];

To call the function, I just pass in the data and the key name and sub-group name pairs I want, like this:

# ...to generate the output example from above... my $output = tree_data( $input, 'team', 'employees' ); # ...to go a little deeper... my $output = tree_data( $input, 'team', 'employees', 'employee', 'work_days' );

Going forward, I'm probably going to want to add the ability to pull multiple keys into a particular level, not just a single key; but that's really out of scope for what I'm trying to accomplish right now.

What sort of things should I do to make this function better? Thanks in advance for your suggestions.

gryphon
Whitepages.com Development Manager (WDDC)
code('Perl') || die;

Comment on Build tree data structure from DB (flat) data; function golf
Select or Download Code
Re: Build tree data structure from DB (flat) data; function golf
by NamfFohyr (Scribe) on Aug 30, 2006 at 17:50 UTC
    Disclaimer: noobish advice

    I don't understand why it concerns you that the foreach pushes to an array. It's scoped within the subroutine so I don't think it's particularly inefficient.

    Because you push twice to the output array, a map is difficult to pull off while preserving the overall flow of information. This probably occcurred to you.

    The remaining duplicated code seems natural to me to leave as-is. Wrapping it up in a subroutine seems like overkill; you weren't considering THAT, were you?

    I think the function is stunning, BTW. I'm still wrapping my head around it. One may be able to make its workings more obvious, but I can't help you, as I barely follow how it works. Perhaps my self-declared confusion will prompt some of the Enlightened to elaborate.

    ry

    "I'm not afraid of Al Quaeda. I'm afraid of Al Cracker." -Chris Rock
Re: Build tree data structure from DB (flat) data; function golf
by stvn (Monsignor) on Aug 30, 2006 at 19:25 UTC

    Well I didn't convert the foreach to a map, but I did remove some of your repeated code (closures++), and got rid of some checks and the a tiernary.

    One note, though, this is a destructive function. When I tested the deeper case, it did not work because the original data structure ($old_data) is destroyed during the building of the tree. This is kind of a gotcha.

    sub tree_data { my ( $old_data, $key_name, $next_group, @remaining_keys ) = @_; # guard against bad input and serve as recursion end case return $old_data unless defined $key_name; my @new_data; my $last_key = ''; # if this is initialized, you dont have to check + below my $build_box = []; # initalize this explicity (more self documentin +g) # encapsulate code into inner sub # fortunately for us, all variables # are closed over too :) my $push_new_data = sub { return unless $last_key; push @new_data, { $key_name => $last_key, $next_group => tree_data( $build_box, @remaining_keys ) }; # re-init this $build_box = []; }; foreach ( @{ $old_data } ) { if ( $last_key ne $_->{ $key_name } ) { $push_new_data->(); } $last_key = delete $_->{ $key_name }; push @{ $build_box }, $_; } $push_new_data->(); return \@new_data; }

    Here is a complete test script which compares against the old one.

    -stvn

      Greetings stvn,

      This is awesome. Thanks! I need to reprogram my brain so I'll think to use closures more readily. Maybe I need to re-read HOP. You make a great point about this being a destructive function. I think it's easy to fix, though:

      $last_key = delete $_->{ $key_name }; push @{ $build_box }, $_;

      ...becomes...

      push @{ $build_box }, $_; $last_key = delete $build_box->[-1]{$key_name};

      UPDATE: I'm a crazy idiot. This doesn't do what I claim it should do. Bad programmer. No cookie. All I'm doing is moving the reference. I need to deep copy instead.

      gryphon
      Whitepages.com Development Manager (WDDC)
      code('Perl') || die;

Re: Build tree data structure from DB (flat) data; function golf
by halley (Prior) on Aug 30, 2006 at 19:56 UTC
    Of the notable virtues of all good programmers, cleverness isn't one.

    You're refactoring a common routine out so that it's more generalized and useful. You want to standardize to avoid introducing cut-n-paste bugs on the next opportunity to make a similar tree structure. Thus, it's more important to be clear and obvious than to be compact.

    If there's ever a specific reason to adopt a more compact and clever form of an algorithm, it should be to take advantage of some efficiency trick, turning a very slow O(n) into a very fast O(n). And explain what you're leveraging clearly. I see no particular efficiency worries here.

    --
    [ e d @ h a l l e y . c c ]

Re: Build tree data structure from DB (flat) data; function golf
by jdporter (Canon) on Aug 31, 2006 at 19:07 UTC

    I find it most convenient to convert to an intermediate format: a hash, keyed by team name.

    my %ident_key = ( from => 'team', to => 'team' ); my %recs_key = ( from => 'employee', to => 'employees' ); my %h; # intermediate hash for ( @$input ) { my %r = %$_; my $k = delete $r{$ident_key{'from'}}; push @{ $h{$k} }, \%r; } # convert to your desired structure: my @output = map { { $ident_key{'to'} => $_, $recs_key{'to'} => $h{$_}, } } keys %h;

    If you need the input order to be preserved, you could use something like Tie-IxHash.

    We're building the house of the future together.
      I was going to suggest a change in data structure as well.
      my $input = [ { 'team' => 'A-Team', 'employee' => 'Face', 'work_day' => '2006-08-28', 'other_data' => '123456789', }, { 'team' => 'A-Team', 'employee' => 'Murdock', 'work_day' => '2006-08-28', 'other_data' => '123456789', }, { 'team' => 'Military', 'employee' => 'Decker', 'work_day' => '2006-08-28', 'other_data' => '123456789', }, ];
      To:
      my %input = ( 'A-Team' => { { 'employee' => 'Face', 'work_day' => '2006-08-28', 'other_data' => '123456789', }, { 'employee' => 'Murdock', 'work_day' => '2006-08-28', 'other_data' => '123456789', }, }, 'Military' => { { 'employee' => 'Decker', 'work_day' => '2006-08-28', 'other_data' => '123456789', }, }, );
        The structure's ideal format depends on how it will be used

        gryphon's output seems like something intended for HTML::Template, which only accepts scalars or arrays of hashrefs unfortunately. It wouldn't be able to iterate over your proposed structure.

Re: Build tree data structure from DB (flat) data; function golf
by imp (Priest) on Sep 02, 2006 at 05:33 UTC
    I like the closure approach suggested by stvn. It comes with a small performance penalty because of all the function calls (7-20% depending on the dataset and spread), but it makes the code much easier to read/maintain.

    On a related note - it would be good to add some documentation that describes your strategy. It isn't immediately apparent, and I had to re-read the code a few times before it was clear.

    Found one problem: the following line should test whether $last_key is defined, to avoid problems when your data contains an empty string or '0'.

    if ( $last_key and ( $last_key ne $_->{ $key_name } ) ) {

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://570409]
Approved by McDarren
Front-paged by shmem
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others avoiding work at the Monastery: (11)
As of 2014-04-21 13:41 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    April first is:







    Results (495 votes), past polls