http://www.perlmonks.org?node_id=559013

Some time ago I wrote a subroutine which takes a number of arrays named @out_a @out_b @out_c etc. and gets one entry at a time in parallel from a specified subset of these arrays to write into a tab-delimited line which is added to a single output array. This output array was then used to load an Excel spreadsheet using the "Import Data" function. The parameter passed to the subroutine is an array of letters which designate the names of the arrays to be used for input. The input arrays are all different lengths. (I have since changed the code to write the data directly into the spreadsheet using OLE stuff but that is not the point here.) In the process of writing this code I discovered a very subtle distinction between "my" and "local" variables which was not initially obvious to me. I don't know if this is the correct place to post this but it is not a question so I am posting it here. There are obviously a number of ways to improve the code but again the point is the subtle difference between "my" and "local".
sub meld_columns { my (@cols_to_meld) = @_; my ($ctm, $arrnm, $aline); my @outarray; # "local" needed for the following vice "my" because # the indirect references [$$ctm] require access to # the global symbol table and "my" variables are not # in the global symbol table. local ($a, $b, $c, $d, $e, $f, $g, $h); # Set column indicators all to -1 (eol) $a = $b = $c = $d = $e = $f = $g = $h = "-1"; # then set the ones we are processing to 0 foreach $ctm (@cols_to_meld) { $$ctm = 0; } while (($a >= 0)||($b >= 0)||($c >= 0)||($d >= 0)||($e >= 0)||($f +>= 0)||($g >= 0)||($h >= 0)) { $aline = ''; foreach $ctm (@cols_to_meld) { $arrnm = 'out_'."$ctm"; if (($$ctm <= $#$arrnm) && ($$ctm ne "-1")) { $aline .= "$$arrnm[$$ctm]\t"; $$ctm++; $$ctm = "-1" if ($$ctm > $#$arrnm); } else { $aline .= "\t\t"; $$ctm = "-1"; } } $aline =~ s/(.*)\t$/$1/; # get rid of trailing tab push @outarray, ("$aline"); } return @outarray; }

Replies are listed 'Best First'.
Re: my vs. local subtlety
by davidrw (Prior) on Jul 03, 2006 at 16:57 UTC
    There are obviously a number of ways to improve the code

    You probably don't want to be using variable variable names (which makes your root problem go away).. A bunch of nodes, including the following two, refer to the reasons:
    Have to also throw out the obligatory comment about avoiding the use of $a and $b because of their special global-ness w/sort

    Could you also provide sample input & output to this sub? I strongly suspect this could be written w/ map & join .. something like (note the hash slice usage):
    sub meld_columns { my ($arrays, @arrNames) = @_; my ($maxIdx) = sort { $b<=>$a } map {$#$_} @{$arrays}{@arrNames}; my @outarray = map { my $i = $_; # keep track of the current "row" to use in inner map + block # get the respective element from each array. # If we're beyond the individual array length, just use "\t". # Also glue all cols together w/a "\t" join "\t", map { $i <= $#$_ ? $_->[$i] : "\t" } @{$arrays}{@arrNam +es} } 0..$maxIdx; } my %arrayHolder = map { $_ => [] } 'a' .. 'z'; # build the initial d +ata structure (instead of named arrays) my @out = meld_columns( \%arrayHolder, 'a' .. 'h' ); # meld just A th +rough H my @out = meld_columns( \%arrayHolder, 'f', 'a', 'c' ); # meld just F + A C, in that order.
Re: my vs. local subtlety
by esr (Scribe) on Jul 03, 2006 at 23:35 UTC
    All of the above comments are valid. But if one: a) has used symbolic refs successfully in other situations; b) does not recognize that symbolic refs find only package variables; c) does not recognize that "my" variables are lexical; d) uses "my" variables frequently in subroutines to avoid the possibility of duplicate variable names; and e) does not recognize that the use of such constructs in a subroutine are "unsafe", "non-robust", "non-versatile", and difficult to maintain; then one can get very frustrated trying to figure out why something isn't working as expected. Some or all of those things applied to me when I wrote that code initially 2-3 years ago. To someone who may be new to Perl, or is not experienced in some of these things, the distinction may boil down to a difference between using "my" and "local" without understanding what is under the covers.

    Thanks for your comments. I was not familar with the "map" function mentioned above so I have something new to learn about.

    I rewrote the subroutine in the original snippet long ago by passing a list of array refs rather than the letters so the original snippet is only of value to others if this discussion helps them to avoid the pitfalls mentioned.

Re: my vs. local subtlety
by esr (Scribe) on Jul 03, 2006 at 19:21 UTC
    As stated in the original post, it was not my intention to suggest that this code was the only or best way to accomplish what I was trying to do. I could obviously have passed an array of array references or any number of other things rather than an array of letters. I posted it only because I thought it might be helpful to others to note the subtle "gotcha" between defining variables with "my" and with "local" when using a construct such as $$ctm.

    As you noted, using $a and $b was an unfortunate choice in the snippet example but those names were chosen for brevity in the example, not as a model of good code.

      I posted it only because I thought it might be helpful to others to note the subtle "gotcha" between defining variables with "my" and with "local" when using a construct such as $$ctm.
      right, but my point was that if they don't use that construct, and instead use a safer, more robust, versatile, and maintainable construct, then the "gotcha" doesn't even exist ... IMHO I just saw it as recommending that one wear gloves when punching a wasp nest out of a tree -- the first thing anyone is gonna ask is "why aren't you using a ten-foot pole?".
      As quoted from the article (it's a good read if you haven't yet) i mentioned above, (Summary #2) Being careful does not mean trying to understand the possible consequences while behaving dangerously; it means avoiding dangerous behavior in the first place. Don't say ``I know it's dangerous, so I'll be really careful.'' Say ``I know it's dangerous, so I won't do it.''
      I thought it might be helpful to others to note the subtle "gotcha" between defining variables with "my" and with "local" when using a construct such as $$ctm.

      Or a construct such as $#$arrnm or $$arrnm[$$ctm]. That is to say, your @out_a, etc. arrays also cannot be lexical (my) variables.

      The distinction isn't between my and local, it's between lexical and package variables. Symbolic refs (which look up a variable by an expression giving it's name) always find package variables, not lexical ones.