gryphon has asked for the wisdom of the Perl Monks concerning the following question:
Greetings fellow monks,
After looking over a recent project, I noticed I had written similar code in several sections that effectively did the same thing: Grab data from a database (single query) and build a tree structure from that data. Each time, I was building a structure according to similar rules, so I decided to make a common function.
What I've come up with works, but it has some repeated code. But what concerns me the most is that it's more or less a foreach loop that pushes to an array... which makes me want to rewrite it as a map. Anyway, I've been looking at this for a while and am stuck. I'd like some pointers as to how I could improve the function.
sub tree_data {
my ( $old_data, $key_name, $next_group, @remaining_keys ) = @_;
my ( @new_data, $last_key, $build_box );
foreach ( @{ $old_data } ) {
if ( $last_key and ( $last_key ne $_->{ $key_name } ) ) {
push @new_data, {
$key_name => $last_key,
$next_group => ( @remaining_keys )
? tree_data( $build_box, @remaining_keys )
: $build_box,
};
undef $build_box;
}
$last_key = delete $_->{ $key_name };
push @{ $build_box }, $_;
}
push @new_data, {
$key_name => $last_key,
$next_group => ( @remaining_keys )
? tree_data( $build_box, @remaining_keys )
: $build_box,
};
return \@new_data;
}
What I'm trying to accomplish here is take data from a database call and build a tree from it based on key names. Here's some sample input:
my $input = [
{
'team' => 'A-Team',
'employee' => 'Face',
'work_day' => '2006-08-28',
'other_data' => '123456789',
},
{
'team' => 'A-Team',
'employee' => 'Murdock',
'work_day' => '2006-08-28',
'other_data' => '123456789',
},
{
'team' => 'Military',
'employee' => 'Decker',
'work_day' => '2006-08-28',
'other_data' => '123456789',
},
];
Here's some sample output:
my $output = [
{
'team' => 'A-Team',
'employees' => [
{
'employee' => 'Face',
'work_day' => '2006-08-28',
'other_data' => '123456789',
},
{
'employee' => 'Murdock',
'work_day' => '2006-08-28',
'other_data' => '123456789',
},
],
},
{
'team' => 'Military',
'employees' => [
{
'employee' => 'Decker',
'work_day' => '2006-08-28',
'other_data' => '123456789',
},
],
},
];
To call the function, I just pass in the data and the key name and sub-group name pairs I want, like this:
# ...to generate the output example from above...
my $output = tree_data( $input, 'team', 'employees' );
# ...to go a little deeper...
my $output = tree_data(
$input, 'team', 'employees', 'employee', 'work_days'
);
Going forward, I'm probably going to want to add the ability to pull multiple keys into a particular level, not just a single key; but that's really out of scope for what I'm trying to accomplish right now.
What sort of things should I do to make this function better? Thanks in advance for your suggestions.
gryphon
Whitepages.com Development Manager (WDDC)
code('Perl') || die;
Re: Build tree data structure from DB (flat) data; function golf
by halley (Prior) on Aug 30, 2006 at 19:56 UTC
|
Of the notable virtues of all good programmers, cleverness isn't one.
You're refactoring a common routine out so that it's more generalized and useful. You want to standardize to avoid introducing cut-n-paste bugs on the next opportunity to make a similar tree structure. Thus, it's more important to be clear and obvious than to be compact.
If there's ever a specific reason to adopt a more compact and clever form of an algorithm, it should be to take advantage of some efficiency trick, turning a very slow O(n) into a very fast O(n). And explain what you're leveraging clearly. I see no particular efficiency worries here.
-- [ e d @ h a l l e y . c c ]
| [reply] |
Re: Build tree data structure from DB (flat) data; function golf
by stvn (Monsignor) on Aug 30, 2006 at 19:25 UTC
|
Well I didn't convert the foreach to a map, but I did remove some of your repeated code (closures++), and got rid of some checks and the a tiernary.
One note, though, this is a destructive function. When I tested the deeper case, it did not work because the original data structure ($old_data) is destroyed during the building of the tree. This is kind of a gotcha.
sub tree_data {
my ( $old_data, $key_name, $next_group, @remaining_keys ) = @_;
# guard against bad input and serve as recursion end case
return $old_data unless defined $key_name;
my @new_data;
my $last_key = ''; # if this is initialized, you dont have to check
+ below
my $build_box = []; # initalize this explicity (more self documentin
+g)
# encapsulate code into inner sub
# fortunately for us, all variables
# are closed over too :)
my $push_new_data = sub {
return unless $last_key;
push @new_data, {
$key_name => $last_key,
$next_group => tree_data( $build_box, @remaining_keys )
};
# re-init this
$build_box = [];
};
foreach ( @{ $old_data } ) {
if ( $last_key ne $_->{ $key_name } ) {
$push_new_data->();
}
$last_key = delete $_->{ $key_name };
push @{ $build_box }, $_;
}
$push_new_data->();
return \@new_data;
}
Here is a complete test script which compares against the old one.
| [reply] [d/l] [select] |
|
$last_key = delete $_->{ $key_name };
push @{ $build_box }, $_;
...becomes...
push @{ $build_box }, $_;
$last_key = delete $build_box->[-1]{$key_name};
UPDATE: I'm a crazy idiot. This doesn't do what I claim it should do. Bad programmer. No cookie. All I'm doing is moving the reference. I need to deep copy instead.
gryphon
Whitepages.com Development Manager (WDDC)
code('Perl') || die;
| [reply] [d/l] [select] |
Re: Build tree data structure from DB (flat) data; function golf
by NamfFohyr (Scribe) on Aug 30, 2006 at 17:50 UTC
|
Disclaimer: noobish advice
I don't understand why it concerns you that the foreach pushes to an array. It's scoped within the subroutine so I don't think it's particularly inefficient.
Because you push twice to the output array, a map is difficult to pull off while preserving the overall flow of information. This probably occcurred to you.
The remaining duplicated code seems natural to me to leave as-is. Wrapping it up in a subroutine seems like overkill; you weren't considering THAT, were you?
I think the function is stunning, BTW. I'm still wrapping my head around it. One may be able to make its workings more obvious, but I can't help you, as I barely follow how it works. Perhaps my self-declared confusion will prompt some of the Enlightened to elaborate.
ry
"I'm not afraid of Al Quaeda. I'm afraid of Al Cracker."
-Chris Rock
| [reply] |
Re: Build tree data structure from DB (flat) data; function golf
by jdporter (Chancellor) on Aug 31, 2006 at 19:07 UTC
|
I find it most convenient to convert to an intermediate format: a hash, keyed by team name.
my %ident_key = ( from => 'team', to => 'team' );
my %recs_key = ( from => 'employee', to => 'employees' );
my %h; # intermediate hash
for ( @$input ) {
my %r = %$_;
my $k = delete $r{$ident_key{'from'}};
push @{ $h{$k} }, \%r;
}
# convert to your desired structure:
my @output = map {
{
$ident_key{'to'} => $_,
$recs_key{'to'} => $h{$_},
}
} keys %h;
If you need the input order to be preserved, you could use something like Tie-IxHash.
We're building the house of the future together.
| [reply] [d/l] |
|
I was going to suggest a change in data structure as well.
my $input = [
{
'team' => 'A-Team',
'employee' => 'Face',
'work_day' => '2006-08-28',
'other_data' => '123456789',
},
{
'team' => 'A-Team',
'employee' => 'Murdock',
'work_day' => '2006-08-28',
'other_data' => '123456789',
},
{
'team' => 'Military',
'employee' => 'Decker',
'work_day' => '2006-08-28',
'other_data' => '123456789',
},
];
To:
my %input = (
'A-Team' => {
{
'employee' => 'Face',
'work_day' => '2006-08-28',
'other_data' => '123456789',
},
{
'employee' => 'Murdock',
'work_day' => '2006-08-28',
'other_data' => '123456789',
},
},
'Military' => {
{
'employee' => 'Decker',
'work_day' => '2006-08-28',
'other_data' => '123456789',
},
},
);
| [reply] [d/l] [select] |
|
| [reply] |
Re: Build tree data structure from DB (flat) data; function golf
by imp (Priest) on Sep 02, 2006 at 05:33 UTC
|
I like the closure approach suggested by stvn. It comes with a small performance penalty because of all the function calls (7-20% depending on the dataset and spread), but it makes the code much easier to read/maintain.
On a related note - it would be good to add some documentation that describes your strategy. It isn't immediately apparent, and I had to re-read the code a few times before it was clear.
Found one problem: the following line should test whether $last_key is defined, to avoid problems when your data contains an empty string or '0'.
if ( $last_key and ( $last_key ne $_->{ $key_name } ) ) {
| [reply] [d/l] |
|
|