|Perl: the Markov chain saw|
Optimization for readability and speed (code)by deprecated (Priest)
|on Apr 30, 2001 at 22:15 UTC||Need Help??|
deprecated has asked for the
wisdom of the Perl Monks concerning the following question:
I've been plunked down into a new development environment and I am writing code that others after I am gone (I am a contractor) need to be able to read and maintain. For purposes of maintainability and readability, I wrote the following sub instead of using a regular expression:
which then uses the following subs:
in a loop that looks like this:
ugh, though, it is pitifully slow. I mean reeeeeally slow. I resisted using a regular expression not because I couldnt craft one, but because they tend to be totally unmaintainable (in is_ms_num()). (yes I know about the /x modifier, but I dont think even that would clarify the regex necessary for the people who will be maintaining this code)
Dominus makes a point on his website for his upcoming book that usually recursion isnt necessary, that a single iteration over an array will produce the desired results and be faster as well as clearer. Well, I am certainly using recursion here. The way that this code is being used is to examine subject lines from an inbox. It is possible that the mailserver is pitifully slow -- I cannot actually inspect the perl process itself to see if it is eating up the whole CPU, but from uptime(1) I can see that something is sending the load to 3.5.
The time taken to parse these messages is on the order of 20-60 seconds per message. With an inbox of 180 messages, this could take 2-3 hours. Thats really just unacceptable. You could just say "well, move it to a faster server." The code will actually not be running on the Ultra 2 I am writing it on, but chances are, it wont be faster than an Ultra 10.
I'd appreciate any insights into making the code a little more efficient without sacrificing the (perhaps overfriendly) readability of it.
update: Okay, I spoke to the lead developer and he said that while he isnt particularly fond of regular expressions being used because of code maintainability I explained that it was saving quite a bit of code and would definitely improve the speed of the sub. So here is the RE version:
which, annoyingly, isnt much faster at all. We need a new development server. *sigh*.