As we speak I have a script running which is consuming *two* expensive resources - CPU and disk I/O. It started running on April the 21st. It is now the 14th of May and I expect it to finish in the wee small hours of the 15th. It even spends a lot of that time in some tight loops, as it is producing summaries of a very large data-set.

However, "optimizations" like worrying about whether I initialize my variables are foolish. I have loads of far better optimizations. For example, my script runs exactly as parallel as is most efficient, all the time. This means that it uses all the CPU cores available while minimizing conflicts over resources.

If I care to optimize it further I will minimize disk I/O. But the time taken for disk I/O is only about 20% of the time taken by the process, so we're already getting into the stage where the amount of time it would take to optimize the code or the money it would take to invest in more memory (so less use of disk for intermediate results) or an SSD (just as much disk I/O but contention matters less) isn't really worth it.