Re: A refactoring trap
by jhourcle (Prior) on Aug 16, 2005 at 21:57 UTC
|
Think before refactoring!
Immediately after reading Perl Best Practices, and before, after reading Perl Medic, and countless other other books, I've wanted to go back and rewrite my code.
Luckily, I have version control, and was able to roll back everything that I broke. Don't change things just for change's sake. I know, I'm a bad one to preach on this one, as I've done 'optimizations' and 'clarifications' and such to the projects that I've worked on way too many times. Yes, it might be a one line fix, but it might introduce other subtle errors, as you found, and it may be something that you might not catch for days, weeks, or even months down the road. At that point, you can end up losing significant time from trying to pretty something up.
So, I'd say that the two rules that you should learn from Perl Best Practices apply to more than just Perl -- version control, and a good library of tests.
Now, there might be times when it's worth fixing up the documentation (I'm going to be handing off this project to someone else, and I want to make sure everything's good), or optimization (the processes is taking more than 2 seconds, and that's unacceptable).
I still cringe every time I look at some of my old code. Instead of risking breaking it, by trying to clean it up, I prefer to just go in, and make sure there's a comment at the top reflecting how long ago it was written, so if anyone starts giving me crap for it, I can point to that.
I've started working in a few of the tips from the book into my work, but if I went back and tried fixing every bad script I've written, it might take me a year, as I deal with cleaning up new messes that I'd make.
Update: Ovid's right -- I've been changing more than one thing at a time, and I haven't been running tests after every change. I guess this is one of the times when having too many tests can be a problem ... I don't mind running 5 min of tests for an install, but it's damned annoying for every 10 sec modification.
| [reply] |
|
I've also been reading and enjoying the book. However, I have been going through and making sweeping changes in the codebase I am currently working on. Why isn't my stuff breaking? Because I am only changing one thing at a time and I'm running the tests after every change.
I can feel confident that I am not hurting anything and I'm very confident that some of the tricky bits in my code are much easier to understand. The poor programmers who will have to go in and maintain my code may not notice it, but they would certainly have noticed some of the things that were in there before.
In fact, I've committed my .perltidy.rc file to the code base to ensure that programmers who follow me will find it easy to at least maintain consistent formatting.
| [reply] |
|
In fact, I've committed my .perltidy.rc file to the code base to ensure that programmers who follow me will find it easy to at least maintain consistent formatting.
Thats a very interesting idea. Id almost like to see your perltidy.rc posted here, along with any others that folks might like to contribute. I think it would be an interesting addition to the site.
---
$world=~s/war/peace/g
| [reply] |
|
Re: A refactoring trap
by Albannach (Monsignor) on Aug 16, 2005 at 21:22 UTC
|
One of my personal rules: Always question a rule that starts with always.
While I agree that /x can be mighty handy, I think that your case is so simple as to be made less clear by the change. Perhaps this is because I am used to certain regex idioms, or I just don't like your choice of braces, but I would have thought your first version was better, apart from the explict use of $line which is an improvement.
Well the best thing about opinions is that you can always get another.
--
I'd like to be able to assign to an luser
| [reply] [d/l] |
Re: A refactoring trap
by Nkuvu (Priest) on Aug 16, 2005 at 21:32 UTC
|
Actually? In this case I prefer the regexen without the x modifier. The regex /^\s*$/ is not long enough to require internal comments, the comment after the regex explains it fully. The addition of whitespace just spreads it out.
As opposed, of course, to potentially confusing regexen such as /\[\s*([^\]]+)/ which, I think, gained a lot of readability with the x:
/\[\s* # literal [ followed by optional whitespace
( # begin capture
[^\]]+ # get one or more chars that are not ]
) # end capture
/x # end regex
(note that this regex is part of a script at work where the future maintainers will probably not know regexen well, which explains the over commenting) | [reply] [d/l] [select] |
|
Does it really gain that much? I think commenting out a regex like that is just as silly and pointless as writing:
my $tmp = 0; # Make $tmp 0.
$i ++; # Increment $i.
print "hello"; # Print 'hello' to the screen.
If I want to comment the regex, I'd write:
# Capture what's between brackets,
# omitting leading white space.
/\[\s*([^\]]+)/;
Because that documents the intent of the regex, not the mechanics. Given the context, I'd probably even document what's between the brackets, and not just the fact I'm capturing between brackets, for instance:
# Capture the wiki-link.
/\[\s*([^\]]+)/;
Now, that's useful, and you immediately know what to change if the syntax of the wiki-links changes. Where as your example carefully documents the tiny steps it takes, and not the overall picture.
You might want to read Brian W. Kernighan, Rob Pike: The Practice of Programming, which discusses correct commenting style as well. And there's no reason to assume you should use a commenting style for regexes that's considered bad style for other code. | [reply] [d/l] [select] |
|
Does it gain that much in this instance? Probably not. Commenting on intent is actually a nice way to put it, and one habit that I'm trying to break is commenting on mechanics. I've written a number of scripts for people who are not Perl-savvy, and they come back with questions on how this or that works. So for a long time I started commenting on every single regex on how it works -- because that's what people were asking about. These people are general C/C++ programmers, so I didn't get into the habit of commenting code like your first example, thank dog.
So you have a very good point, and I'll add your reading suggestion to my list of books. I have some good habits, some not so good. I knew when I posted this that I might have comments about the style -- which is really what I want. Nothing like critiques of bad practices to break the habit.
| [reply] |
|
On the off chance that you really use your "potentialy confusing" regex, I will comment that it seems a little inconsistent to me. Can you tell me what it does (including what becomes $1) for each of:
"[ x]", "[ ]", and "[]"?
Is it what you wanted it to do?
| [reply] [d/l] |
|
I should also have mentioned that this was a known-fragile regex to parse out some very specific files. The files in question are Matlab files that always have something in the brackets, but not always a leading space. I do have this commented in the actual script, I just didn't add the comment here.
| [reply] |
Re: A refactoring trap
by bart (Canon) on Aug 17, 2005 at 11:27 UTC
|
You don't have to use /[#]/x, you can use /\#/x just as well; just like you more or less have to use /\ /x in order to be able to match a space. | [reply] [d/l] [select] |
|
/[ ]/x matches a space as well.
| [reply] [d/l] |
|
But the regular expression engine is more likely to be able to optimize a literal character as opposed to a character class.
| [reply] |
|
Re: A refactoring trap
by redlemon (Hermit) on Aug 17, 2005 at 07:34 UTC
|
while (<DATA>) { ... }
is not the same as
while (my $line = <DATA>) { ... }
The first while will stop as soon as <DATA> returns undef, so at the end of the data. The second while however, will stop as soon as it finds an empty line.
The correct refactoring should have been:
while (defined (my $line = <DATA>)) { ... }
-- Lyon | [reply] [d/l] [select] |
|
No. This is magic in Perl5:
wolverian@noesis:~$ perl -MO=Deparse -e'while (my $foo = <DATA>) { }'
while (defined(my $foo = <DATA>)) {
();
}
__DATA__
-e syntax OK
| [reply] [d/l] |
|
while (my $line = <DATA>) {print $line};
__DATA__
one
two
three
four
results in
one
two
three
four
It seems that empty lines don't break the loop at all. | [reply] [d/l] |
|
Hmmm, you're right. It doesn't break.
The reason is that blank lines are not truely blank, of course. You're saved by the fact that a blank line still contains the "\n".
-- Lyon
| [reply] |
|
|
|
|
|
while (my $line = <HANDLE>) {...}
is that it stops the loop prematurely if $line is false, but defined. There are only two string values that are defined, but false: the empty string, and the string containing the character 0, and nothing else. The first cannot happen - even an "empty line" ends with a newline. The latter can happen in two situations: for a file ending in "$/0", that is, the last line contains 0, and does not end with the line terminator. But regular text files always do (otherwise, they aren't proper text files anyway). The second case is if you read in the file character by character, by setting $/ = \1;, and the file contains a 0.
If you have opened the file using a PerlIO filter, anything can happen (or if you have tied the file handle). But then, an undefined value may be returned as well, even if the file hasn't completely read. | [reply] [d/l] [select] |
|
while (my $line = <DATA>) { ... }
while (defined (my $line = <DATA>)) { ... }
are identical, your first remark is true. The problem lies elsewhere:
while (<DATA>) { ... }
doesn't localize $_ and thus leaves $_ undefined after the loop.
while (my $line = <DATA>) { ... }
doesn't suffer from this as it doesn't touch $_.
If put in a subroutine this innocent little bug cause a serious headache while trying to find out why $_ all of a sudden is undefined.
ihb
See perltoc if you don't know which perldoc to read!
| [reply] [d/l] [select] |
Re: A refactoring trap
by Roy Johnson (Monsignor) on Aug 17, 2005 at 19:00 UTC
|
You could also join the two expressions into one: next PARSE if /^\s*(?:#|$)/;
That might be complex enough to refactor using /x.
Caution: Contents may have been coded under pressure.
| [reply] [d/l] |
Re: A refactoring trap ( regex /x modifier activates new metacharacters)
by LanX (Saint) on Feb 18, 2015 at 01:39 UTC
|
I just ran into the same problem and it took me a while to debug it.
Wanted write a warning thread ... but of course PM had it already, so I just need pushing it up again. =)
Shortly stated: When composing regexes from smaller parts be aware that " " and "#" are real new metacharacters under /x and not only syntactic sugar on the top level.
DB<135> $a='#x'
=> "#x"
DB<136> "x#x#x" =~ s/$a$a/-/r
=> "x-"
DB<137> "x#x#x" =~ s/$a$a/-/xr #oops
=> "-x#x#x"
"$a$a" becomes "#x#x" which is an empty regex under /x since it starts with a comment.
One solution¹ is to pre-compile the smaller parts w/o x-flag
DB<138> $a=qr/#x/
=> qr/#x/
DB<139> "x#x#x" =~ s/$a$a/-/r
=> "x-"
DB<140> "x#x#x" =~ s/$a$a/-/xr
=> "x-"
another one escaping or using a character class
DB<144> $a='\#x' # or ='[#]x'
=> "\\#x"
DB<145> "x#x#x" =~ s/$a$a/-/r
=> "x-"
DB<146> "x#x#x" =~ s/$a$a/-/xr
=> "x-"
simply using quotemeta might bite you again when you wanted to use other metacharacters.
NB: same problem with whitespace.
¹) IMHO the cleanest and still unmentioned in this thread :) | [reply] [d/l] [select] |