Beefy Boxes and Bandwidth Generously Provided by pair Networks
Welcome to the Monastery

Do as I say. Not as I do

by davorg (Chancellor)
on Dec 17, 2001 at 15:47 UTC ( #132506=perlmeditation: print w/replies, xml ) Need Help??

When I'm teaching Perl, one of the principles I always try to drive home is the importance of optimising your code for readability. You should always assume that the next person to look at your code a) won't be you and b) won't be as clever as you. Don't put unnecessarily complex code into your programs. Good advice (I think). I always try to code with those rules in mind.

Then, whilst showing someone some of the code I've written at work, I came across this:

foreach (@in) { my ($count, $file) = split; $file =~ s/^$in_prefix//; my ($campaign, $month, $fname) = split /\//, $file; $_ = { campaign => $campaign, month => $month, file => $fname, count => $count, sort => "$campaign:$month:$file" }; }

@in starts off with each element containing a string consisting of the number of lines in a file and the full pathname (yes, it's the output from wc -l). After the code has run, each element contains a reference to a hash containing data about the file.

But's really not very clear what it's doing or (worse) how it does it. The code should really be rewritten to something more like this:

my @files = map { my ($count, $file) = split; $file =~ s/^$in_prefix//; my ($campaign, $month, $fname) = split /\//, $file; { campaign => $campaign, month => $month, file => $fname, count => $count, sort => "$campaign:$month:$file" }; } @in;

Creating a new array has got to be better than mysteriously transforming one.

The worse thing is that this was code I wrote in the last six months. I don't even have the excuse that I was young and didn't know any better :)


"The first rule of Perl club is you do not talk about Perl club."
-- Chip Salzenberg

Replies are listed 'Best First'.
Re: Do as I say. Not as I do
by George_Sherston (Vicar) on Dec 17, 2001 at 18:50 UTC
    I'd never thought it through particularly, since in my case it is much less likely anyone else will have to read my code, but readability does seem to be quite a personal thing. I think there are some styles that are going to be less readable for almost everyone, but a lot of cases where it's totally dependent on the reader. I probably betray my own sloppy habits when I say I found your version one easier to understand than version two.

    My {pre-decimalisation pound} / 120 is that the map version wd be more readable like this:
    my @files = map {&makefiles} @in; sub makefiles { my ($count, $file) = split; $file =~ s/^$in_prefix//; my ($campaign, $month, $fname) = split /\//, $file; return { # not strictly necessary, but *readable* campaign => $campaign, month => $month, file => $fname, count => $count, sort => "$campaign:$month:$file" }; }

    George Sherston
Re: Do as I say. Not as I do
by trantor (Chaplain) on Dec 17, 2001 at 16:44 UTC

    Still, I think that the second snippet is not optimised for readability: functions like map and grep are sometimes obscure even for programmers with decent experience.

    Moreover, the block that map evaluates returns a hash reference, which could easily taken as a code block.

    Why not simply using the first block, pushing the hash reference into another array, not @in, like this:

    push @files, { campaign => $campaign, month => $month, file => $fname, count => $count, sort => "$campaign:$month:$file" };

    Just my €0.02


      I don't agree that grep and map should be avoided. In a Unix environment grep is a tried, tested, and well known tool. Grep in Perl is essentially the same thing. Map is a gloried foreach loop simplified to assign a list to an lvalue returned from the associated code block. In fact, the word "map" is straight out of beginning algebra where we talk about functions mapping inputs to outputs.

      Which brings me to the root of this thread-- and the fact that I agree and disagree with davorg. I agree that he might refactor (and comment briefly) this routine. And I agree that map is the way to go. But why not take the code block out from inside the loops and make this an actual sub? Then we can give the block of code a name that offers a clue as to its function (alleviating some of the need for comments along those lines), and we can access the code from outside our map statement if we want/need to. And we can still call it from a map at that point:
      my @files = map {func($_)} @in; sub func{ ... } #or if we really hate the idea of a whole sub for this #function that keeps the block hanging around when it's #finished being useful my $func = sub { ... } my @files = map {$func->($_)} @in;

      Which keeps our map from getting obfuscated due to the length of the block it surrounds-- and probably the first example I give makes it easier to introduce newer programmers to this powerful feature of Perl. It makes it very plain that what's going on here is that a list of inputs is being passed into func and mapped onto a list of outputs.

      Using a sub we can throw this function after our main block and keep its internals out of sight and out of mind when we are reading the rest of the enclosing block. That way we don't obscure that layer of code with details about how we transformed the lines of the wc output into data for our program. Much more readable, imho.
Re: Do as I say. Not as I do
by VSarkiss (Monsignor) on Dec 17, 2001 at 21:02 UTC

    Hmm. Not sure I agree the second one is clearer than the first. A multi-line map can be hard to follow, and it's not clear to me that "Creating a new array has got to be better than mysteriously transforming one."

    To a great extent, understandability is in the eye of the beholder. Nonetheless, I think a couple of very minor changes or additions to your first code would have made your intent clearer, more so than using map to cons up a new array. Here's what I'd suggest:

    1. Add a comment just above the loop saying # Modify input array in-place.
      Now your intent is very clear.
    2. Name the array element instead of using $_. I personally love and use $_ a lot, but if I have to refer to it explicitly, I stop and ask whether a named variable would be better. In this case, I don't know what's in @in, so I'd have to settle for $elem, $node, or something.
    3. A very minor point, but I really dislike leaning-toothpick regular expressions. I'd change the second split to use delimiters other than /.
    Otherwise, I think it's very clear what the first loop is doing. Here's my take on it:
    # Modify the input array in-place foreach my $elem (@in) { my ($count, $file) = split /\s+/, $elem; $file =~ s/^$in_prefix//; my ($campaign, $month, $fname) = split |/|, $file; $elem = { campaign => $campaign, month => $month, file => $fname, count => $count, sort => "$campaign:$month:$file" }; }

    Changed split to split $elem, not $_. Thanks, runrig! (But let the goof caught by Hofmator stand. ;-)
    Fixed split again. This is not my day....

      I'd change the second split to use delimiters other than /.

      Good point, VSarkiss, but when using a different delimiter you have to write

      split m|/|; # instead of split |/|;

      I agree fully with your suggestions and like especially your point about making the comment. This solution was not mentioned elsewhere. A perl programmer should know that $_ (or $elem) is an alias for the actual element. Nevertheless overreading this is easy, so the comment is a good solution.

      -- Hofmator

        Good point, VSarkiss, but when using a different delimiter you have to write
        split m|/|; # instead of split |/|;
        Or you can say: split "/"; (though " " is, of course, a special case, but for other simple one character (non-metacharacter) splits I like to use quotes :) If you're splitting on literal ".", you can't use "." but you could use "\\." but then I'd just assume use /\./ ...Got all that?
Re: Do as I say. Not as I do
by talexb (Canon) on Dec 17, 2001 at 19:20 UTC
    I'm wondering if you couldn't rearrange the split and the =~ statement (I don't know if the input array would allow this) .. so you'd do
    # Delete (something) from the input stream, separate the # count from the file name. s/$in_prefix//; my ( $count, $file ) = split;


    As a matter of coding style I use brackets when calling something like split; Since the delimiter is a slash, I'd also avoid the leaning toothpicks by using ' delimiters. So I would write

    # Decompose directory/sub-directory/file into campaign, # month and file. my ( $campaign, $month, $fname ) = split ( '/', $file );

    Finally, I'd probably add the comment that you mentioned already at the top.

    # Build an array of hash references from the output from # the output wc -l command containing campaign, month and # count data. my @files = map {

    However, this looks like a situation where a hash of hashes might be more useful -- especially if you want to do subtotals across various months -- or campaigns.

    "Excellent. Release the hounds." -- Monty Burns.

Re: Do as I say. Not as I do
by jackdied (Monk) on Dec 17, 2001 at 22:55 UTC
    Everything is a matter of taste, and reading perl is a function of how you are used to seeing perl written, but I would go for more explicit var names and just a couple comments. The code itself is failry easy to read.

    foreach my $line (@data) { my ($linecount, $filename) = split(/\s+/, $line); $filename =~ s/^$in_prefix//; # strip base directory from filename my ($campaign, $month, $fname) = split('/', $filename); # info is st +ored in the directory nesting push @files, { campaign => $campaign, month => $month, file => $fname, count => $count, sort => "$campaign:$month:$file" }; }
    It is a bit more verbose, and may be a bit slower because it uses explicit temporaries (instead of $_). I personally hate $_ and avoid it like the plague. I know this isn't a widely shared view.

    But hey, you asked for opinions, right?


Re: Do as I say. Not as I do
by clintp (Curate) on Dec 18, 2001 at 04:29 UTC
    I didn't think that was obfuscatory at all, and actually I thought your first example was clearer than the second. On second glance I couldn't help but notice the split, substitute, split and thought of another way to make this a bit clearer still. Untested, I don't have the input data handy:
    foreach (@in) { next unless m!(.*?) # Count \s+ ($in_prefix)? (([^/]+)/ # campaign ([^/]+)/ # month ([^/]+))!x; # fname $_ = { campaign => $4, month => $5, file => $3, count => $1, sort => "$4:$5:$6" }; }
    I probably blew the regex but you get the idea. :)

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: perlmeditation [id://132506]
Approved by root
and all is quiet...

How do I use this? | Other CB clients
Other Users?
Others lurking in the Monastery: (4)
As of 2018-07-21 11:55 GMT
Find Nodes?
    Voting Booth?
    It has been suggested to rename Perl 6 in order to boost its marketing potential. Which name would you prefer?

    Results (448 votes). Check out past polls.