One thing you may have taken into account (though I probably missed) is that most filesystems, even network ones, tend to cache file attributes. The very first test that runs tends to get poor results. Even if the cache is warmed up, the caching behavior also changes as soon as you test on a large enough population of files. The cache in this case can turn into a kind of FIFO buffer, though you will still see caching effects (esp with sorts).
One way to help mitigate this is to test on a network filesystem with attribute caching disabled (e.g. many NFSs use the -noac option on the mount). You will probably still see server side caching effects, though I imagine the network delays will tend to dominate.
Another way to handle this might be to put the results for file attributes into a hash before the actual benchmark timing, and then introduce a constant delay during each lookup. (You could time how long it took to fill the initial hash and then use that to compute the delay, but then again this too can be affected by a warm cache...)