Beefy Boxes and Bandwidth Generously Provided by pair Networks
Your skill will accomplish
what the force of many cannot

Coding style: truth of variable name

by perlancar (Hermit)
on Apr 19, 2020 at 00:26 UTC ( [id://11115759] : perlquestion . print w/replies, xml ) Need Help??

perlancar has asked for the wisdom of the Perl Monks concerning the following question:

How anal are you regarding putting values that are consistent with variable names? Which one do you prefer?

# 1A. not very anal, but more concise for my $dir (glob "*") { # for a brief moment, $dir might not hold a directory's name next unless -d $dir; ... }
# 1B. more "correct", with additional related variable for my $dir0 (glob "*") { # a "0" suffix is my convention for variables that contain an unpr +ocessed value next unless -d $dir0; my $dir = $dir0; ... }
# 1C. more "correct", with additional variable and concept for my $entry (glob "*") { next unless -d $entry; my $dir = $entry; ... }
Another example:
# 2A. not very anal sub load { # actually, module can be in the form of "Foo::Bar" or "Foo/ +". # but will be canonicalized to "Foo/" form my $module = shift; if ($module =~ /\A\w+(::\w+)*\z/) { $module = join("/", split /::/, $module) . ".pm"; } ... }
# 2B. sub load { my $arg = shift; my $module_pm; if ($arg =~ /\A\w+(::\w+)*\z/) { $module_pm = join("/", split /::/, $arg) . ".pm"; } else { $module_pm = $arg; } ... }

In general, if the content of variable becomes more precise, or change slightly, and then the variable gets mentioned again a few times, do you usually use a new variable? Input validation is the bread and butter of a programmer; do you usually put prevalidated value into $thing, knowing that at some point it is not yet or not certainly that thing?

EDIT 1: Changed next if to next unless.

Replies are listed 'Best First'.
Re: Coding style: truth of variable name
by GrandFather (Saint) on Apr 19, 2020 at 00:44 UTC

    In general the cost in any sense of introducing a new variable is trivial or nothing so go wild. Using appropriate variable names is a large part of good coding technique. Having a variable change its stripes easily leads to hard to understand code. The issue is added "cognitive load" - the reader needs to remember more stuff to understand the code.

    Another thing to think about is how does changing the meaning of a variable affect debugging? If you introduce a new variable it means you have both versions available for inspection in a debugger at the same time so it can be much easier to see where unexpected results were introduced and why. For this reason I often break down complex expressions into multiple statements with appropriately named variables holding intermediate results. It makes writing, debugging and maintaining the code easier.

    Optimising for fewest key strokes only makes sense transmitting to Pluto or beyond

      I was about to post exactly the opposite advice, but your post changed my mind, at least conditionally. The key concept is "appropriate". Extra variable names only reduce the "cognitive load" if both variables have appropriate names and scopes. I once had a customer who insisted that we always use the extra names. Any advantage was lost because he also insisted that all variables be named 'parmxxx' where xxx was a serial number he assigned.

      I often use your technique of breaking up complex statements on "Schwartzian Transforms" (Refer to How do I sort an array by (anything)?). I almost never code them right the first time. I usually rewrite them in the idiomatic form after I am satisfied that they are correct.


      Thanks for the debugging perspective, I didn't consider it.

Re: Coding style: truth of variable name
by choroba (Cardinal) on Apr 19, 2020 at 01:28 UTC
    Sometimes, you can solve the problem by avoiding the situation completely:
    for my $dir (glob '*/') {
    I'm not sure how it works on MSWin, so maybe more portably
    for my $dir (grep -d, glob '*') {

    The second case is different. I'd probably declare two subroutines:

    sub load_from_path { my ($module_path) = @_; # etc. } sub load_from_name { my ($module_name) = @_; my $module_path = ...; load_from_path($module_path); }
    I mean, it's not the responsibility of the "load" subroutine to convert the module name to its path. It's the caller's responsibility to know what they have and what they want to do with it.

    map{substr$_->[0],$_->[1]||0,1}[\*||{},3],[[]],[ref qr-1,-,-1],[{}],[sub{}^*ARGV,3]

      True in the above cases. Bad examples then :)

      How about the common cases where a function needs to validate its input. Do you assign the pre-validated content to the variable first, or do you assign it to something else first and then to the final variable after it's validated?


        In the case of input validation, I usually let the variable name express the intent then validate the value into submission before doing the work:

        sub frob_file { my $frobbable_filename = shift; die "Error" if !-e $frobbable_filename or -d $frobbable_filename; die "Nope!" if not_frobbable($filename); ... frob file ... }

        In other cases when the variable isn't so clear but it will be clear shortly, I'll often use $t or $tmp for the placeholder. Then I'll give it a name or pass the data off to a better-named thing:

        sub zap_the_thing { my $t = shift; my @files_to_zap; if (-d $t) { zap_dir($t); } else { push @files_to_zap, $t; } ... yadda ... zap_files(@files_to_zap); }

        I don't think $t or $tmp is a great name, but finding good names is hard. I use it so that I can look at it and dispose of it ASAP.

        Frequently I find I can't name something well the first time I encounter or use it. So I come up with my best guess of the name and use it. Then, when it feels like the name is wrong, and I find it doesn't fit, I do one of two things: If I have a better name in mind, I'll rename it. Sometimes, though, I can't think of a better name, so I instead give it a prefix of 'z' to "call it out". That way, when I revisit the code, I know I need a better name. Not perfect, not even good, but it usually gets me by. Yet I still wind up with stuff like:

        # ?NEED GOOD NAME? # If a group (Row, Col, Blk) has only one slot for a particular value, # solve that cell. sub solve_v_in_only_one_cell_in_R_C_B { my ($self, $GEN) = @_;

        an atrocity which came directly off my screen from last nights session.


        When your only tool is a hammer, all problems look like your thumb.

        I read unvalidated into one variable and then put it into a validated variable when I call the validation routine. The variables differ only in their prefix. I learned this from, which is still worth reading. Here's a pseudocode example:

        my $inv_data = get_input(); my $val_data = val_from_inv($inv_data);


        John Davies

Re: Coding style: truth of variable name
by dsheroh (Monsignor) on Apr 20, 2020 at 08:01 UTC
    1A/2A for me, pretty much every time. Name the variable what it's supposed to be, then immediately check and skip/abort/throw an exception/halt and catch fire as appropriate if it's actually something else (1A), or reformat it if it's the right thing but not expressed in quite the way you want (2A).

    1B and 1C feel like meaningless expansion of code to me. I prefer that subs are short enough to look at the whole thing at once, and "meaningless-but-not-blank" lines of code that don't do anything more than "copy data from an unvalidated-data-name variable into a validated-data-name variable" reduce the number of meaningful lines of code that can be in view, and they don't even give you the visual structure that blank whitespace lines provide.

    2B just gives me the heebie-jeebies. Variable names should be meaningful and $arg is the opposite of meaningful. Yes, yes, it is an argument to the sub, but that's the only information the name $arg tells you. I want a name that tells me what the arg is (or at least what it should be). If the only information you want to convey about the value is that it's an argument, you may as well just skip the shift and refer to it as $_[0], or use a bare shift and access it as $_. (Yes, IMO $arg really is that utterly meaningless as a name.)

Re: Coding style: truth of variable name
by ikegami (Patriarch) on Apr 19, 2020 at 22:28 UTC

    If you have to simply copy the value from one variable to another just because the value has changed, you're probably going to far. (Your first example.)

    If, however, some transformation was applied, might as well use an accurate variable name for the transformed value. (Your second example.)

    It's often that one must deal with file name, file paths and absolute paths in the same piece of code. I use the following convention to distinguish them:

    • $fn, $dir_fn or $foo_fn: A file name (no path)
    • $qfn, $dir_qfn or $foo_qfn: A qualified file name (a relative or absolute path)
    • $fqfn, $dir_fqfn or $foo_fqfn: A fully-qualified file name (an absolute path)

    As such, you'll find me doing

    while (defined( my $fn = readdir($dh) )) { my $qfn = "$dir_qfn/$fn"; ... }

    No point in using the same variable. The same goes for your second example.

    sub load { my ($pkg) = @_; my $qfn = $pkg =~ s{::}{/}gr . '.pm'; ... }

    Why would you use the same var? To save memory? Perl might not be the best choice of language if you think that's important.

    In your first example, you call the variable "dir", but you would already have a variable by that name in practice.

    for my $qfn (glob("\Q$dir_qfn\E/*")) { stat($qfn) or do { warn("Skipping \"$qfn\": Can't stat: $!\n"); next; }; next if !-d _; ... }

    You could also use the following:

    for my $subdir_qfn (glob("\Q$dir_qfn\E/*")) { stat($qfn) or do { warn("Skipping \"$subdir_qfn\": Can't stat: $!\n"); next; }; next if !-d _; ... }

    Sure, it might not be a subdir, but you could think of it as a subdir candidate. Adding another variable to the mix wouldn't help.

    If someone wanted to assist on not putting the value in $subdir_qfn unless the name matches perfectly, one could use the following:

    for my $subdir_qfn ( grep { if (stat($_)) { -d _ } else { warn("Skipping \"$_\": Can't stat: $!\n"); 0 } } glob("\Q$dir_qfn\E/*") ) { ... }
Re: Coding style: truth of variable name
by jcb (Parson) on Apr 19, 2020 at 04:01 UTC

    My general rule is to reuse a variable when the old value is no longer needed and the variable name also describes the new value, so I would prefer 2A but the comment explaining that $module is to be a canonicalized module name is very important. Creating additional lexicals is cheap, but not free in Perl. (Additional locals are essentially free in most cases in C since modern compilers allocate the entire stack frame at once.)

    My fellow monk choroba made a good point about filtering input when you can, but I would also prefer 1A because that type of filtering at the beginning of a loop's block is idiomatic in Perl. Concision in this case is also useful in that the more concise code requires fewer VM steps because it avoids an extra lexical. Filtering the input is the best option, since grep iterates in C and reduces the number of iterations perl's VM must execute. This is a trivial concern in most cases, but can be serious in an inner loop.

    Lastly, I think you meant "next unless -d $dir" in 1A, 1B, and 1C — "next if -d $dir" skips the iteration if $dir does name a directory and would be very confusing in all three cases.

    Edited by jcb: Add missing caveat; thanks to GrandFather for pointing out my mistake.

      My general rule is to reuse a variable when the old value is no longer needed

      Taken at face value that is terrible advice. A large part of understanding code is understanding the role of variables at any particular point. That is why choosing good variable names is important.

      If the role of the variable changes through the code then understanding the code becomes much harder. So maybe that wasn't what you meant by that statement?

      Optimising for fewest key strokes only makes sense transmitting to Pluto or beyond

        I had forgotten an important detail. That should be "when the old value is no longer needed and the variable name also describes the new value". This balancing act is to avoid proliferating variable names like $file, $file2, $realfile, and similar problems that I have seen in existing code, including the questioner's example 1B.

        If the role of a variable can change, then (in my view, in Perl) the variable is defined in a scope that is too wide for the code as written. I often reuse the same name for another (similar) purpose later in a sub or script, for example, if iterating over two different sets of files, both foreach loops are likely to use foreach my $filename ..., but the variables are separate lexicals and $filename does not exist outside of those loops.

        Thanks for catching that — the idea that a variable name must describe its contents is something that I tend to assume goes without saying and that the questioner here seems to also tacitly understand, but that is an important detail that a new programmer might not yet know.

      Ah yes, thanks for the correction about next unless.
Re: Coding style: truth of variable name
by leszekdubiel (Scribe) on Apr 19, 2020 at 22:31 UTC
    for my $dir (glob "*") { # for a brief moment, $dir might not hold a directory's name # ^^^^ that's ok -- you just check if $dir is okey for you: next unless -d $dir; ... } for my $dir (glob "*") { -d $dir or next; # put "-d" first, because it is more importan +t than "next" $dir =~ /photos|thumbs/ or next; ... ... ... ... bla bla $dir ... ... bla bla for (...) { ... bla bla $_ and $_ ... ... and $dir ... } ... ... ... $dir... ... ... # long loop body -- it is important to use "$dir" variab +le ... ... } Short processing: for (glob "*") { -d or next; # "or next" -- fall back less important + then "-d" /photos|thumbs/ or next; do_something } Better written like this, data flow from bottom to up: do_someting with $_ for # finally feed good dirs to "do +something" sort # third step grep { -d && /photos|thumbs/ } # second step glob "*"; # first step
Re: Coding style: truth of variable name
by Anonymous Monk on Apr 19, 2020 at 13:46 UTC
    Try to write short routines with locally-scoped variables that are named to clearly illustrate their meaning, not their data type. The very worst thing that can ever happen is that I am trying to understand the meaning of your code ... you got mashed by a bread truck so I can't ask you ... and I get it wrong. I overlook something. I misinterpret it. Or even, your variable-names suggest something that is not or is no longer true. "K. I. S. S." I can only read your code I cannot read your mind. But you can clearly suggest to me what you were thinking at the time.
Re: Coding style: truth of variable name (subroutine length)
by Anonymous Monk on Apr 19, 2020 at 02:54 UTC

    How long is your subroutines?

    Variable names are much less important than subroutine names

    file folder

    for my $file ... grep glob whatever for my $path ... grep glob whatever for my $anal ... grep glob whatever

      My subroutines can range from just a few lines to over several hundred lines long. Labelled blocks sometimes help in making long subroutine clearer, as well as creating lexical scope to isolate the effect of variables.

      sub do_some_task { my ($arg1, $arg2, $arg3) = @_; SUBTASK1: { my $some_var = ... ... ... ... } ... SUBTASK2: { ... ... ... ... ... ... } ... ... }

      I do have to question your claim. What makes variable names much less important than subroutine names? Variables are referred to much more often.

        My subroutines can range from just a few lines to over several hundred lines long. I do have to question your claim. What makes variable names much less important than subroutine names? Variables are referred to much more often.

        isnt it obvious? subroutine length, obviously :)

        these are all identical for a screen full (40 to 100 lines) dir dir0 entry arg module modulepm file path ana

        entry arg anal dir0 are the shit versions from least to most. But only cause I dont actually think that way.

        Even they doesnt increase cognitive load... Var names only need to be close enough.

        hundred lines long ... arg1 arg2 arg3 subtask1 subtask2


        chop it down to skimmable code

        @args Ebony() Ivory() Harmony()

        age of peter, sum of bob