http://www.perlmonks.org?node_id=183320

aufrank has asked for the wisdom of the Perl Monks concerning the following question:

I feel like I should go to confession to say this, but here goes: I'm not very good with map, and terrible with sort and grep. I am, however, getting to the point that I can tell when I should be using one of these functions, and I think I've found just such a case. Sadly, I don't know what to do now, as I certainly don't want to go on write more unnecessary and inefficient lines of code, but don't have the skills I need to do it in better way.

I need to go through an array of hashes where the hashes all have the keys instiution, race, age. I need to know how many men in various age ranges are enrolled in each institution, and how many men of those age ranges are of each race. I started to do something like this:

# there are actually lots more institutions than this, # but this gets the point across if (!defined $institution) { $institution_table{"unaffiliated"} ++; } elsif ($institution =~ /^$hospital1$/i) { $institution_table{$hospital1} ++; } elsif ($institution =~ /^$hospital2$/i) { $institution_table{$hospital2} ++; } else { $institution_table{"other"} ++; } foreach (sort keys %institution_table) { $institution_table{$_}{"fortyfive_fifty"} ++ if (45 < $age <= 50); $institution_table{$_}{"fifty_fiftyfive"} ++ if (50 < $age <= 55); $institution_table{$_}{"fiftyfive_sixty"} ++ if (55 < $age <= 60); #...and so on up through the 85-90 age bracket }
aside from being really ugly, I'm pretty sure it's not even close to being the best way to do this. I've got a strong intuition that the answers lie in doing sexy things with grep, map, and/or sort, but that's about all I've got (aside from a pile of ugly code).

So, any suggestions on how to find the number of men in each age range at each hospital?

thanks for any help,
--au

Replies are listed 'Best First'.
Re: an easier way with grep, map, and/or sort?
by tadman (Prior) on Jul 19, 2002 at 17:45 UTC
    For one, you can't use chained comparisons until Perl 6. Instead, you have to do this:
    if (55 < $age && $age <= 60) ...
    When dealing with ranges, if they are predictable intervals, you can do math and end up with something like this:
    foreach (keys %insitution_table) { $institution_table{$_}{ages}[$age/5]++; }
    This builds an array of 5-year grouped ages. Consider how difficult it would be to iterate through a listing which uses text names for ranges. "fifty" comes before "forty" in the alphabet, and "twenty" comes after. If you need to know who's in the 50-55 age group:
    my $sample = $insitution_table{$_}{ages}[60/5];
    As for your initial block, you're using regular expressions when you could be using something much simpler:
    my %valid_institutions = map { $_ => $_ } ( $hospital1, $hospital2, ); # ... my $key = $institution && ($valid_institution{lc($institution)} || "other") || "unaffiliated"; $institution_table{$key}{count}++;
    This sets $key to be the appropriate spot to insert. You can add new types to the list and no new code is required. If there's one thing that really irritating, it's these endless chained "if" statements that do very little but take up a ton of room. Sure, if you get paid per line of code, you might have a case, but still. It looks like it escaped from some long forgotten COBOL code if you ask me.

    Also, I've put 'count' at the end because you were using the hash in an invalid way, something you would notice with use strict; Note that you were putting a number into, for example, $institution_table{$foo} and then later using this number as a hash when you do this: $institution_table{$foo}{bar}++

    Update:
    By request, here's a quick explanation about what the map was doing. I've used it here to turn a list of variables into a hash, so that later, it is very easy to see if another variable is in this original list. Typically, this is done like this:
    my %foo = map { $_ => 1 } qw[ foo bar baz ]; my $foo = 'foo'; print "\$foo is in the list\n" if ($foo{$foo});
    However, since in this case the output value of the hash was going to be used for something else, I just assigned it to the same value. This way the code looks like this:
    ... $valid_institution{lc($institution)} || ...
    Instead of:
    ... $valid_institution{lc($institution)} && lc($institution) ||  ...
      my $key = $institution && ($valid_institution{lc($institution)} || "other") || "unaffiliated";
      When I'm doing code review for hire, I do not permit using testthing && truething || falsething in production code unless there's also a big comment in the margin that says literally:
      ## I'm promising that the true branch here can NEVER EVER EVER ## return a false value, which would have caused the ## false branch to be erroneously executed. If you're ## updating this code, please continue to maintain ## this precondition, which I have placed precariously ## and needlessly in this code. Have a nice day.
      If you don't want to add a big friggin' comment like that, then change the very dangerous broken and-or construct to the entirely safe testthing ? truething : falsething, as in:
      my $key = $institution ? ($valid_institution{lc($institution)} || "other") : "unaffiliated";

      -- Randal L. Schwartz, Perl hacker

      If there's one thing that really irritating, it's these endless chained "if" statements that do very little but take up a ton of room.

      I couldn't agree more-- that's pretty much what motivated this question!

      It looks like it escaped from some long forgotten COBOL code if you ask me.

      never had the pleasure of even seeing COBOL code, but as I was trying to write perl, I would far prefer it look like idiomatic perl than anything else ;)

      Thanks for your response. I'll try it out just as soon as I get my head all the way around how it works :D

      --au

Re: an easier way with grep, map, and/or sort?
by kvale (Monsignor) on Jul 19, 2002 at 17:55 UTC
    I don't have any clever map sort or grep approaches, but the code could be simplified. For the sequence of if-then statements, one could use
    my @institutions = qw|hospital1 hospital2|; my %seen; $seen{$_}++ foreach @institutions; if (!defined $institution) { $institution_table{"unaffiliated"} ++; } elsif (exists $seen{lc $institution}) { $institution_table{lc $institution} ++; } else { $institution_table{"other"} ++; }
    For the loop, here is a slightly more clever method:
    my $cat_age = int ($age/5) - 9; my @age_cats = qw|fortyfive_fifty fifty_fiftyfive fiftyfive_sixty|; foreach (keys %institution_table) { $institution_table{ $_ }{ $age_cats[$cat_age] }++; }
    One could use a map for the loop, but the foreach is faster.

    -Mark

Re: an easier way with grep, map, and/or sort?
by broquaint (Abbot) on Jul 19, 2002 at 17:51 UTC
    The initial if block can be simplified to something like this
    if(not defined $institution) { $institution_table{ unaffiliated }++; } else { $institution_table{ $_ }++ for grep { $institution eq $_ } ( $hospital1 $hospital2 ); }
    But I think that could be cleaned up more by using less singular variables. As for the for loop you're trying to use perl6 style number comparisons and the comparisons don't seem to match the age bracket. Maybe you mean something like this
    $institution_table{$_}->{"fortyfive_fifty"} ++ if 45 > $age and $age <= 50;
    And the sort() is superfluous in the for loop (well at least from the code provided). Probably the best way to simplify your is to simplify your data structures (as one follows the other). </code>
    HTH

    _________
    broquaint

Re: an easier way with grep, map, and/or sort?
by aufrank (Pilgrim) on Jul 19, 2002 at 17:43 UTC
    I recognize that the code should be:
    foreach (sort keys %institution_table) { $institution_table{$_}{"fortyfive_fifty"} ++ if (45 < $age and $age <= 50); $institution_table{$_}{"fifty_fiftyfive"} ++ if (50 < $age and $age <= 55); $institution_table{$_}{"fiftyfive_sixty"} ++ if (55 < $age and $age <= 60); #...and so on up through the 85-90 age bracket }
    but am a big tool, and so did not check carefully enough before I posted. Apologies :\

    Thanks again,
    --au