Beefy Boxes and Bandwidth Generously Provided by pair Networks
The stupid question is the question not asked

Two simple code style advice questions

by eyepopslikeamosquito (Chancellor)
on Jan 16, 2013 at 10:02 UTC ( #1013548=perlquestion: print w/replies, xml ) Need Help??
eyepopslikeamosquito has asked for the wisdom of the Perl Monks concerning the following question:

Came up during code review today.

1. Initializing a hash

Given something like:

my @tests = ( "tfred", "tjock", "tfortytwo" );
do you prefer:
my %ntests = map { $_ => 0 } @tests;
my %ntests; @ntests{@tests} = (0) x @tests;

2. Setting a string to a value or the empty string

Do you prefer:

my $mol = "forty two" x ($n == 42);
my $mol = ($n == 42) ? "forty two" : ""

Please feel free to suggest alternative solutions.

Replies are listed 'Best First'.
Re: Two simple code style advice questions
by vinoth.ree (Monsignor) on Jan 16, 2013 at 11:02 UTC

    I prefer only the following

    1.Initializing a hash

    my %ntests = map { $_ => 0 } @tests;

    2.Setting a string to a value or the empty string

    my $mol = ($n == 42) ? "forty two" : ""

Re: Two simple code style advice questions
by Anonymous Monk on Jan 16, 2013 at 10:45 UTC

    1) I have no preference (even if slice might Benchmark faster), its all the same , but I like to online the slice

    my %foo; @foo{@bar} = (0) x @bar;

    Just because its not legal to  my( @foo{ @bar } ) = (0) x @bar; doesn't mean it can't be onelined :)

    2) This one is trickier, both of those irritate me slightly, though not enough prefer either, or not-use either :)

    Out of habit, unless my editor helps out with whitespace, I multi-line my ternary

    my $foo = 42 == $bar ? ".." : "";

    Although I actually prefer

    my $foo = ""; 42==$bar and $foo = "forty two";

    I'm fine with  my $foo = ""; $foo = "forty two" if 42==$bar; too

    All the choices are very much a non issue for me, even if the slightly irritating versions break/disrupt flow/scanability/skimmability for some, like inline comments ( Documentation: POD vs Comments), I doesn't slow me down much , its mosquito or hurdle not doors/walls

Re: Two simple code style advice questions
by Anonymous Monk on Jan 16, 2013 at 10:44 UTC

    For 1st case, if I have the chance to define a spanking new hash, I would use the map version. Else, I slightly prefer the 2d version using the hash slice.

    For 2d case, I would prefer the ? : version unless the context is of mathematics. Then again I would rather use the module.

Re: Two simple code style advice questions
by BrowserUk (Pope) on Jan 16, 2013 at 13:25 UTC

    1. a marginally; thought I'd accept b. (But I'd one-line it.)
    2. Definitely b. a) is very obscure.

    With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.
Re: Two simple code style advice questions
by blue_cowdawg (Monsignor) on Jan 16, 2013 at 14:33 UTC

    I like code that I can see exactly what is going on without going back and forth in the lines of code. Especially when a script becomes gargantuan. In your first case:

    my %tests = map { $_ => 0 ) qw/ tfred tjock tfortytwo / ;
    would be my preferred style. I could also go with:
    my $tests ={}; map { $tests -> { $_ } = 0 } qw/ tfred tjock tfortytwo /;
    as long as those two lines are next to each other in the code. Passing a reference to a hash to subs later on is more readable to my eyes that something like:
    but that is a personal preference.

    In your second case I am a big fan of

    my $errstr = ( $case != OK ? $msg[$case] : "" );
    kinds of things. (Line right out of one of my projects). The term OK is the result of a stack of use constant OK =>0; sorts of things where I use constants to aid in my code's readability.

    Peter L. Berghold -- Unix Professional
    Peter -at- Berghold -dot- Net; AOL IM redcowdawg Yahoo IM: blue_cowdawg

      OK defined as 0? Yuck!

      That's as bad as defining TRUE as 0.

      It would be far clearer as:

      use constant NOERROR => 0; my $errstr = ( $case != NOERROR ? $msg[$case] : "" );

      Though I'd skip that conditional statement completely and embed the logic in the data:

      $msg[ 0 ] = ''; ... my $errstr = $msg[ $case ];

      With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
      Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
      "Science is about questioning the status quo. Questioning authority".
      In the absence of evidence, opinion is indistinguishable from prejudice.
            OK defined as 0? Yuck!

        It would seem to me that most Unix commands return 0 when things are "OK." So.. if you want to accuse me of showing my C programming roots, I plead guily.

        Peter L. Berghold -- Unix Professional
        Peter -at- Berghold -dot- Net; AOL IM redcowdawg Yahoo IM: blue_cowdawg
Re: Two simple code style advice questions
by davido (Archbishop) on Jan 16, 2013 at 19:05 UTC

    #1: Readability is better for the map version, in my opinion, though the 'x' operator version isn't bad, and is measurably faster in the uncommon case where such considerations actually matter. ;)

    use Benchmark qw( cmpthese ); @array = ( 'aaa' .. 'zzz' ); cmpthese ( -3, { mapped => q{my %hash = map{$_=>0} @array}, opped => q{my %hash; @hash{@array} = (0) x @array}, } ); __END__ Rate mapped opped mapped 65.9/s -- -45% opped 120/s 82% --

    But except for the odd case of doing such a transform inside of a tight loop, the benchmark is probably totally inconsequential when weighed against the legibility issues.

    #2: I prefer the ternary operator. Everyone knows what is happening here. The 'x' operator version is a nifty contortion. And it would be silly to bother thinking in terms of processing time in such a trivial snippet; I suspect the result would be within the margin of error.

    So for me, map, and ternary. I would love to favor the slice approach, but it takes a little more cognition to mentally evaluate.


Re: Two simple code style advice questions
by LanX (Bishop) on Jan 16, 2013 at 17:29 UTC
    After working for a while with Perl "programmers" which were mostly converted PHP-hackers and quickly trained sociologists (both with an experimental approach to Perl) I got very cautious about production code. (Especially because everybody was allowed to change and commit into any project... )

    So there is a personal preference and a $work preference...

    Part 1

    So clearly (a) is better as long as speed doesn't matter, even for me the $_ => 0 part evidently signals a hash assignment to my visual cortex. And this construct is easily changed to other meanings.

    The only exclusion to that rule may be this idiom @ntests{@tests} = ()
    to assign undef.

    Anyway in aforementioned work context even map was risky.¹

    So maybe

    my %ntests; $ntests{$_}=0 for @tests;

    would cause the least problems.

    Part 2

    For me (a) is an obfuscation hack. It goes to deep into brain loops about numeric type casting.

    So this time (b) for me.

    At $work (where the word "ternary" provoked empty glares¹) maybe rather:

    my $mol = ""; $mol = "forty two" if $n==42;

    HTH! :)

    Cheers Rolf

    ¹) It hurts, I know!

      At $work (where the word "ternary" provoked empty glares)

      I cannot imagine any other profession where the experience practitioners would 'dumb down' their output to accommodate the inexperienced.

      Can you imagine:

      1. A barrister using the phrase a judicial mandate to a prison official ordering that an inmate be brought to the court so it can be determined whether or not that person is imprisoned lawfully and whether or not he should be released from custody rather than a writ of habeus corpus because his apprentice or legal assistance might not understand the latter?
      2. A doctor using heart attack instead of acute myocardial infarction; because his interns might not be familiar with the latter?
      3. An architect referring to a sticky-outty, stubby bit of wall at 90° to the main wall rather than buttress in order to placate, his apprentice?
      4. A composer re-writing his music to avoid all sharps and flats because they make life difficult for air guitarists?

      The short answer (I sorely hope), is a profound NO!.

      So why do experienced programmers who do understand -- none of them ever admit to having problems understanding themselves -- these hardly difficult concepts and constructs, advocate 'dumbing them down' for the sake of those programmers who's education is formative?

      And whatever justifiction you might offer in reply; STOP. And think. Because there is no logical justification.

      If you dumb down, they will never learn, which is in nobody's interest.

      If the first time they encounter a construct they do not understand, they do not ask for (or look up) clarification, then they deserve to be admonished strongly. If they do it a second time; they should seriously consider a different career. If their mistakes made as a result of their lack of understanding make it into production, their mentor deserves admonishment. Or the system that allows un-mentored code to get into production, needs urgent review.

      Advocating the dumbing down of code, as a substitute for (requiring) proper education, is itself dumb.

      With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
      Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
      "Science is about questioning the status quo. Questioning authority".
      In the absence of evidence, opinion is indistinguishable from prejudice.
        As a freelancer you don't insist on giving blow jobs, when the customer only wants to pay for spanking!

        Cheers Rolf

        PS: yes s/freelancer/prostitute/ ;-)

        You have a point, but I think you're taking it a bit too far (and yes, I did stop and think).

        First of all, "dumbing down" is an awfully harsh term here. The for loop and if statement solutions that LanX suggests are functionally equivalent to any other solution that has been proposed, and there is no meaningful performance penalty.

        To me "dumbing down" would be avoiding powerful language features (such as regexes) because they are confusing. That would be dumb. But this is nothing more than a benign style preference.

        Perl's allowable syntax is so vast that anyone who wants to write sane code will only use a subset of it. Maybe avoiding map and the ternary operator is going too far. But as much as I love map, I have to admit that I am usually using it "because I can" not because it is clearly advantageous in my code. And as much as I love the ternary operator, it has its pitfalls (such as the can-be-used-as-an-lvalue-making-assignments-have-unexpected-behavior pitfall).

        Finally, all of your examples are professionals using their jargon with other professionals. Any good doctor will speak quite differently when talking to non-medical staff or a patient. Perl, because of the nature of the language, is often used by Perl "non-professionals"--sysadmins, web designers, and others who only spend a small part of their time writing Perl code. If this is your context, it makes sense to write the code with that in mind. It would be foolish not to.

Re: Two simple code style advice questions
by sundialsvc4 (Abbot) on Jan 16, 2013 at 16:24 UTC

    I would prefer the code which states, most simply and obviously, what the intent of the designer is ... and, also, which builds the maximum flexibility for the future.

    In case #1, if there was a legitimate other use for @tests as an array (versus using keys(%ntests)), and if the value-initialization of %ntests should always be zero .. likely .. then I would use a simple foreach my $key (@tests) { %ntests{$key} = 0; }, written on three source-lines.   Is it “shorter?”   Clearly not.   Is it “faster?”   (Rhett Butler, Gone With The Wind)   But it is clear, to almost anyone who’s written a program in any language out there.   And it’s easily maintainable going forward:   you can put anything you need to inside that block, at any time in the future.

    In case #2, the first alternative is instantly “outed.”   (Say what you mean.)   And once again I would perhaps write something like:   my $mol = ""; if ($n == 42) { $mol = "forty-two"; }   Once again, I am looking towards the future, after two years of programmers meddled with this same bit of code and it grew over all those years into something totally different.

    Basically, when someone writes the first bit of code, whatever it is, they have a very clear notion in their heads at that time of what they’re trying to write at that time, and they get the idea stuck in their heads that there are brownie-points for being “clever.”   We’ve even got a “golf” section here, and it’s fun, but it’s just for fun and we all know it ... or, we should.   But, what happens over the next several years?   Well, one day, a change surely comes to pass which negates the assumptions that allowed the O.C. to have been “clever” in the first place, and that means that his entire block of code must be replaced.   First, it must be correctly understood.   Then, it must be correctly replaced with code that correctly does, not only what the new-change needs to do (which is why the current coder is dealing with this code now at all), but also everything that it did in the past.   Depending on the code, that entirely-unwanted voyage of discovery can become huge, and profoundly de-stabilizing.   A great big cost and delay, maybe, and all for nothing.   “Thanks for nothing, clever-one wherever you are now ...”

Re: Two simple code style advice questions
by Anonymous Monk on Jan 16, 2013 at 13:48 UTC
    It strikes me that, as a pattern, 2b works for all values of True, while 2a does not.


      It strikes me that, as a pattern, 2b works for all values of True, while 2a does not.

      What does that mean?

      I think they both work for all values of true, even if it isn't clear what you mean by that

        It probably means "Using '... x condition' is not always safe":
        say $_ ? "true" : "false", "a" x $_ for 0, 1, "0e0";
        لսႽ† ᥲᥒ⚪⟊Ⴙᘓᖇ Ꮅᘓᖇ⎱ Ⴙᥲ𝇋ƙᘓᖇ
        While choroba's example is elegantly expresses what I was getting at, I'd also like to point out other True values, like 3, 5 or, more entertainingly, MAXINT. >;^)
Re: Two simple code style advice questions
by jhourcle (Prior) on Jan 16, 2013 at 15:54 UTC

    This is probably an over optimization, but for really long lists, I use:

    my %ntests = map { $_ => undef } @tests;

    And then test using exists. It reduces the memory footprint slightly (or at least, it used to ... I admit I haven't verified that it's still true in more recent versions of perl.)

      Probably not true anymore (v5.10.1):
      $ perl -E 'use Devel::Size qw(total_size); my (%h1, %h2); my @ar = 0 .. 1e6; undef @h1{@ar}; %h2 = map { $_ => undef } @ar; say for map total_size($_), \%h1, \%h2; ' 60083287 60083287
      لսႽ† ᥲᥒ⚪⟊Ⴙᘓᖇ Ꮅᘓᖇ⎱ Ⴙᥲ𝇋ƙᘓᖇ

      In that case I am highly partial to...

      my %p; @p{ @q } = ();
        Or even
        my %p; undef @p{ @q } if @q;
        لսႽ† ᥲᥒ⚪⟊Ⴙᘓᖇ Ꮅᘓᖇ⎱ Ⴙᥲ𝇋ƙᘓᖇ
      Not sure the point of this. If you are willing to test using exists, then why bother initializing at all? $hash{key}++ is a perfectly valid way to bring a key into existence.
Re: Two simple code style advice questions
by Anonymous Monk on Jan 16, 2013 at 13:55 UTC

    1) When using a hash to store counts, I prefer just relying on the fact that a key springs into existence upon $hash{key}++. After all, I'm a lazy Perl programmer! Assuming your code needs them all initialized, I'd slightly prefer map I suppose, because it is one line (I don't like putting two statements on a single line, as some are suggesting for the second method).

    2) The first version is kind of wacky. I would use the second.

Re: Two simple code style advice questions (tye)
by tye (Sage) on Jan 18, 2013 at 04:56 UTC

    I'd use 1b, in part because it is an important idiom. On a single line is even better. 1a is okay (the one advantage I'll grant it is that it avoids repeating either variable name -- I'm not sure why I've long used this only grudgingly). But if I wanted to go for maximum clarity, IMHO, I'd instead do:

    my %ntests; $ntests{$_} = 0 for @tests;

    Then there is:

    @{ \my %ntests }{ @tests } = (0)x@tests; # :)

    2a is pretty darn hackish and not something I'd expect to see in professional code. But I've come to find ternaries to often be less quick/easy to read than something like:

    my $mol = ''; $mol = 'forty two' if 42 == $n;

    Despite being 3 lines instead of 1, I find it significantly easier and faster to parse the intent. I especially don't like how the default value is almost lost in the single-line ternary. But I'd probably feel I was being a bit extravagant and then just use the 1-line ternary if the names and values are actually that short. For more and/or longer expressions, I'd format it more like:

    my $mol = ! defined $n ? 'n/a' : 42 == $n ? 'forty two' : '';

    (If I didn't use multiple "assignment \n\t if ...;" statements.)

    - tye        

      I find it significantly easier and faster to parse the intent.

      Could you quantify (in some fashion) what you mean by "significantly easier and faster"?

      my $mol = ( $n == 42 ) ? 'forty two' : '';
      my $mol = ''; $mol = 'forty two' if 42 == $n;

      I find your version quite horrible to parse.

      • Is that one statement or three?

        Oh! It's two!

      • And why is it (are they) all squished up like that?

        It looks like the the code-wrap routine has been given some ridiculously narrow width limit.

      • Why is he comparing a literal against a variable?

        Is the literal's value likely to suddenly change?

        (Yes. I am aware of the justifiction for the backward logic. :)

      As for your last example, I find it almost incredulous that you would code that; and almost impossible to parse without reformatting it.

      Why not just:

      my $mol = defined $n ? ( $n == 42 ? 'fourty two' : '' ) : 'n/a';

      I also find the concentration on the minutia of single statements far less important than the overall flow of the code.

      That is, when scanning the code, I only need to recognise that $mol has been initialised, and then the next step and the next. I'll only be concerned with what it was initialised to once I understand the overall flow; and if I suspect that might be the source of the problem I'm looking for, or otherwise needs closer inspection.

      I don't need to know all the details of each line (or 3 lines!) of code from an instantaneous glance. If I have to read the line twice to understand what it does -- maybe take 2 seconds instead of 1/2 a second -- it is no biggy in the scheme of things. But understanding the overall flow of the subroutine or block is far more important, and that -- for me at least -- means being able to see as much of that subroutine or block as -- clearly defined steps -- as possible. Which is why I infinitely prefer the one line versions to your 3 or 5 line examples.

      With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
      Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
      "Science is about questioning the status quo. Questioning authority".
      In the absence of evidence, opinion is indistinguishable from prejudice.

        Of course, I didn't say it would be easier for you to parse.

        I can parse the three lines completely in a single glance, not having to pause to even wonder if the details of some complex expression are worth worrying about yet or not. The whole point of the code is already tucked away in my head after that one glance.

        But, no, I don't usually parse the flow while ignoring nearby details, because I find it rather time-consuming to construct the vague but complex (for me) "initializes $mol to something, but I haven't bothered to figure out what" concept. And such vague bits would just leave a mental snag to trip over repeatedly as I discern the purpose while mentally going over the flow. I find the factored-out bits need at least a nice name for me to be able to smoothly gloss over them when considering the flow. More often, I will understand some details of the factored-out bit before dismissing those details from consideration of the current flow (usually by assigning a decent name to that bit -- which is why subroutines are such a good way to factor out bits of complexity; they each get assigned a name).

        The expression is complex enough that I can't tell that it is simply initializing the variable to one of a few different values just by glancing at it. Once I start to parse the expression it takes more effort than a glance. My eyes have to focus on several parts and move back and forth matching up the bits to construct the meaning. By the time I've parsed it enough to tell that it isn't calling some subroutine (that might do complex work), I've already spent more time on the one line then I would have spent on the several lines and yet I still don't understand what the code is doing.

        Both of my multi-line versions parse effortlessly for me. I am not aware of even moving my eyes during the parsing. My mental focus smoothly shifts to each line, in order, instantly understanding the full code of the line and then the next line neatly snaps another detail onto the mental model without the need for any backtracking or restructuring.

        The worst problem of the relative bloat in number of lines is if the logical block of code (almost always a subroutine) gets pushed beyond "one screen full" and thus can't be parsed with just simple and fast eye movements. And that would usually lead to me being less "extravagant" or to factoring out some logical sub-section.

        my $mol = defined $n ? ( $n == 42 ? 'fourty two' : '' ) : 'n/a';

        That is a relative strain to parse in comparison. My experience is that matching up parens is one of the most time-consuming parsing tasks for humans (I've tested it using slide shows and even people who claim to match up parens easily can't do it quickly, IME). The constructs involved are not visually apparent. I'm forced to individually recognize single characters and then mentally re-assemble the logical structure from too many tiny pieces.

        This all reminds me of why newspapers print in rather narrow columns and how many people can read such very quickly without their eyes zigging back and forth (I don't think I'm one of those people, though).

        There are two ways I might try to parse that one complex line. The way that always works is to parse the components of it in order. That is slow because the number of parts to parse is much larger (for me) in the one-line case than in the multi-line cases.

        Using newlines to show each mental pause point, the code ends up in my head like:

        my $mol = defined $n ? ( $n == 42 ? 'fourty two' : '' ) : 'n/a' ;

        And then I have to backtrack several times to line up the '(' and the ')' and to line up the 2nd ':' with the first '?' and then it isn't obvious to me when which of the three values get chosen until I mentally simulate running the code, putting the conditions and values into proper association as I go.

        The second method is to glance over the code, recognizing the "obvious" bits and then filling in the gaps between them. That, for me, starts out as:

        my $mol = <noise> 'fourty two' <noise> 'n/a';

        Then I have to visually and mentally jump between the two "noise" piles and (frustratingly) spend mental effort dealing with single characters. Worse, I then have to tie single characters that aren't next to each other together in a complex structure that isn't represented visually.

        My last example gets parsed so smoothly for me. It is just 4 simple visual pieces that are also 4 simple mental pieces:

        my $mol = ! defined $n ? 'n/a' : 42 == $n ? 'forty two' : '';

        The second piece is the best example of why I prefer my version to yours:

        my $mol = defined $n ? ( $n == 42 ? 'fourty two' : '' ) : 'n/a'; # ^^^^^^^^^^ ^^^^^

        'n/a' is the value that represents "not defined". This is so very much more obscured in your version of the code. It is a single visual and mental chunk in my versions. No re-assembly required.

        Your mileage may vary, obviously.

        Note that I'm talking tiny optimizations here. It might take me 1 or even 2 seconds to parse your complex line. But when reading code, spending that much time on a line is a very long time. When the lines aren't complex, I can scan a whole screen of code in 1 second and understand it.

        And it isn't that I find your most complex line of code unacceptable or even fundamentally difficult to parse. I'd even write the following:

        my $mol = ! defined $n ? 'n/a/' : $n == 42 ? 'fourty two' : '';

        if I had a well-factored subroutine that barely fit on a screenful of lines with that as one of them.

        But if I got the subroutine factored to be smaller, I'd get extravagant with those short expressions and be so happy to re-parse and understand the whole subroutine without even having to degrade out of just needing quick glances.

        - tye        

      Your "funny" example does not work:
      # perl -MData::Dumper -e 'my @tests = 1 .. 10; @{ \my %ntests }{ @test +s } = (0) x @tests; print Dumper \%ntests;' $VAR1 = {};
      The problem is the my.
      لսႽ† ᥲᥒ⚪⟊Ⴙᘓᖇ Ꮅᘓᖇ⎱ Ⴙᥲ𝇋ƙᘓᖇ

        Well, I'd say the problem is that @{ } is defined syntactically as containing a full-fledged block (and thus it imposes its own lexical scope).

        @$_{@tests} = (0)x@tests for \my %ntests;

        - tye        

      But I've come to find ternaries to often be less quick/easy to read than something like:

      I am with BrowserUK here, and I am happy to see you are using the "I find" instead of the way to often used "it is better to" (here in the monastery), as many idioms that are easy to parse by person A causes headaches to person B.

      Personally I would try to avoid statement modifiers to any cost. I hate them. They make me read code exactly opposite of what the author meant.

      I have no trouble reading (nested) ternary operations. Maybe too used to those from doing C.

      You also doing java? Where 42 == $n is quite often preferred over $n == 42 because of "string".equals ($n) implies NULL checks.

      Enjoy, Have FUN! H.Merijn
        You also doing java?

        Nope. I started trying out "42 == n" in C a long time ago to avoid accidentally writing "if( n = 42 )".

        In this case, I wrote "42 == $n" as I find "42" to be the much more interesting part of the expression. I prefer to put shorter things and more interesting things first to speed parsing.

        - tye        

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://1013548]
Approved by Ratazong
Front-paged by vinoth.ree
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others pondering the Monastery: (6)
As of 2018-08-20 22:36 GMT
Find Nodes?
    Voting Booth?
    Asked to put a square peg in a round hole, I would:

    Results (196 votes). Check out past polls.