|Don't ask to ask, just ask|
Some quick thoughts:
Firstly, its always good to see if your results stay the same if you permute the test conditions a bit. Make sure you try different numbers for a and b - perhaps graph how these methods change over different ranges.
More interestingly, does changing the order you run your benchmarks change anything? (e.g. does the first run of a memory-hungry implementation cause the process to grab lots of VM from the OS which is then re-used more quickly by the malloc in later runs? Conversely does heap fragmentation slow down memory management for later runs?)
Another thing to keep in mind is that the perl 'interpreter' actually runs an optimisation pass. When you put together simple sample code, things might be trivially optimised away which might not be in your production code.
Lastly, I'm not sure all your examples are comparing like with like. I'd be very wary (vary wery?) about using microbenchmarks to suggest which approach is best for a production app. (For example, what you are attempting is a classic memory/speed trafeoff. Your test and production environments may vary widely in available memory under load.)
Sorry if this is teaching grandma how to suck eggs, but you are better off timing/benching a real app rather than sample code.
PS. Array lookup is a cheap operation in most languages. It is a straight memory reference. The cost is in consuming memory and how that affects end system performance.
Lastly, if you are really caching multiplications then have you considered bit twiddling? :-)