The trouble with selecting a single number to benchmark from is that it may favor one method over the other.
Good catch. Another problem I ran into was that the load on my test system was varying (lab machine, someone was logged in remotely doing a bunch of matlab foolery), so I ended up getting quite different "real" time results with the same input... which is probably what a casual reader would check.
It would be interesting to know what kinds of inputs favour whose method, and (ideally) why.
Yours in pedantry,