This node in no way means that I claim to be an expert on Perl. I hardly consider myself at an intermediate level (I'm still making my way through the Alpaca!). These are just some of the most common ways I've managed to shoot myself in the foot. I thought I would share them here in the hope that they would benefit others and in the hope that I may receive enlightenment from other, more experienced monks on how to better handle these issues. Most of them have to do with regex (go figure!).
Here they are in no particular order:
Regex in a loop
This one is probably due to how much Perl sounds like regular, spoken English. I'm always tempted to go:
while($some_data=~ m/some_regex/)
{
# Do some stuff
}
This just sounds natural: as long as the data you're reading matches your regex, keep doing some stuff. Only, the above code (should the regex match) would yield an infinite loop. The correct semantics for that (as I too often forget) are:
while($some_data=~ m/some_regex/g)
{
# Do some stuff
}
The reason is that the first snippet would always match the same portion of $some_data, there would be nothing to advance the regex engine to other parts of the string. The 'g' modifier is exactly what's needed in this case. Its iterative matching capabilities mean that it keeps trying to match at the position where the last successful match ended, which makes for correct semantics.
A variable as part of a regex
I can't count how many times I wrote the following:
$to_replace='some_string';
$my_string=~ s/$to_replace/$better_data/;
While seemingly innocuous, the above code is actually catastrophically unsafe. To see why, imagine if I had written:
$to_replace='a caret looks like ^';
$my_string=~ s/$to_replace/$better_data/;
See the problem here? The semantics I was going for, here, were that I had this string whose occurrence I wanted to replace in another string. What Perl sees is that the variable $to_replace contains a regex metacharacter (^) and would interpret it as an anchor for a regex match. To get the correct semantics, you should make it a habit to quote your variables in regex substitution:
$to_replace='a caret looks like ^';
$my_string=~ s/\Q$to_replace\E/$better_data/;
Of course if you had meant the string $to_replace is an actual regex to match against, you're better off using the qr operator:
$to_replace=qr/^your_regex_here$/;
$my_string=~ s/$to_replace/$better_data/;
The \Q and \E would have been wrong in this case, of course.
Regex in a loop 2
OK. So I learned about the 'g' modifier's use in a loop and I used it correctly as follows:
while($string=~ m/(reg)(ex)/g)
{
$string=~ s/$1$2/$1ister/;
}
or so I thought!
The problem with the above is that you're modifying the $string variable WHILE you're matching it against your regex. The 'g' modifier keeps track of the position where the last successful match left off, but that position will most likely refer to a totally different place after the substitution goes through. The above example is, of course, a bit contrived because I could just say:
$string=~ s/(reg)ex/$1ister/g;
If I do need to substitute in a loop, I usually match my regex against a dummy copy of the data:
$dummy=$string;
while($dummy=~ m/(reg)(ex)/g)
{
$string=~ s/$1$2/$1ister/;
}
As pointed out by
Jenda, substitution in a loop may be done by omitting the 'g' modifier:
while($string=~ m/(reg)(ex)/)
{
$string=~ s/$1$2/$1ister/;
}
This is no longer an infinite loop because you're modifying the string inside the loop before retrying the match.
Deleting some array elements
I've struggled a lot with the concept of deleting array elements at given indices. The two most common approaches I used were:
for($i=0;$i<@array;$i++)
{
if(&should_delete($i))
{
delete $array[$i];
}
}
and
for($i=0;$i<@array;$i++)
{
if(&should_delete($i))
{
splice @array,$i,1;
}
}
The problem with the first approach is that none of the remaining elements have their index changed: 'delete' simply replaces the element(s) at the given position(s) with 'undef'. The second approach is a total disaster! When you splice an element from the array, the array shrinks. So, while the first splice may work correctly, all the subsequent ones will delete the wrong elements since the indices of all elements would have changed after the first splice. I found a very simple solution for this:
for($i=0;$i<@array;$i++)
{
if(&should_delete($i))
{
splice @array,$i,1;
$i--;
}
}
This works because whenever an element is spliced, the indices of all the elements that come after it will be decremented by one.
Thanks to an amazing one-liner courtesy of
JavaFan, I now know better:
@array = @array[grep {!should_delete($_)} 0..$#array];
An amazing use of grep and (see his analysis in the replies) a more efficient approach than mine.
Slurping gone wrong
I'm a big fan of slurping input files and matching them against a regex in multiline mode. I believe the following is common idiom:
$rec_sep=$/;
undef $/;
$slurp=<INPUTFILE>;
# Regex matching here ...
The problem with this is that it's all too common (for me at least) to forget to return the record separator to its old state (this is usually a newline unless you've changed it for your purposes). Why is this a problem?
Imagine the following scenario (that has occurred to me before):
$rec_sep=$/;
undef $/;
$slurp=<INPUTFILE>;
print "Enter something: ";
$something=<STDIN>;
So what happens? The user, used to ending his/her input by pressing "Return" will be unpleasantly surprised to find that it won't work! Perl completely ignores that newline because it's not the record separator anymore. The solution: redefine $/ right after your slurp:
$rec_sep=$/;
undef $/;
$slurp=<INPUTFILE>;
$/=$rec_sep;
print "Enter something: ";
$something=<STDIN>;
An even better solution (thanks again to
Jenda) would be:
my $data = do {local $/; <INPUTFILE>};
Waiting on a pipe
This is a less obvious one. I once did something close to the following:
open COMMAND,'-|','some_command';
$input=<COMMAND>;
#... bla bla
$pid=fork
#....bla bla
wait if $pid
#... bla bla
All the bla bla's represent sections of code I don't actually remember. The point is, the wait returned way before the forked process terminated. You know why? Because that pipe I had just opened to some command was actually another child process that wait perceived to have terminated correctly. Two better approaches are as follows:
open COMMAND,'-|','some_command';
$input=<COMMAND>;
close COMMAND;
#... bla bla
$pid=fork
#....bla bla
wait if $pid
#... bla bla
or
open COMMAND,'-|','some_command';
$input=<COMMAND>;
#... bla bla
$pid=fork
#....bla bla
waitpid $pid,0;
#... bla bla
Of course, it's always good practice to close your pipes as soon as you don't need them.
True is 1, false is...?
Many, many, many times I've fallen for this one. While debugging, I would print out the value of some logical test to see whether it tested true. When it tests true, it prints out "1" as expected. When it tests false, however, Perl prints nothing. I used to go around in circles thinking there's something wrong with my debugging procedure, that my code never actually reached the print statement for some reason. The thing is, Perl treats the result of a false logical test as the empty string (in scalar context) and (I'm guessing) the empty list in list context; this is why nothing was being printed.
Shooting yourself in the foot on purpose
Finally, I leave you with a very short code snippet that me and a fellow Perl enthusiast once came up with in order to test the limits of our system:
undef $_ for keys %SIG;
fork while 1;
A little piece of advice: close anything important you're working on before you run this one :D
Conclusion
I know that most of these will probably look too obvious to the veterans. Please feel free to suggest better solutions that the ones I provide (or give me another approach altogether). Also, if you can share some of the stories on ways you've managed to shoot yourself in the foot with Perl, that would be awesome, too!
EDIT: Made some changes to the proposed solutions above according to some keen insights from JavaFan and Jenda. Thank you for your constructive criticism.