At this point I think you're mainly measuring noise. You've also still got the bug whereby you populate the lexical %hash, but the benchmarks get run against the *empty* global %hash.
By "noise", I mean a combination of timing noise, and (for lack of a better term) "compiler noise". How C code gets compiled can effect the alignment of machine code bytes across cache line boundaries, which means that different compilers can compile the same source code of the perl interpreter into different executables which have different instruction cache and branch prediction miss patterns. I have personally seen adding a line of code into a part of the perl interpreter which wasn't being executed (e.g. in dump.c) cause a 10% change in benchmark speed for a simple benchmark.
These days I mostly benchmark the perl interpreter using a tool of mine (Porting/bench.pl) based on top of cachegrind, which profiles the execution run in terms of how many individual machine code instructions, branches etc it does. Under that, 'exists' takes slightly fewer instruction and data reads and writes and branches than a hash lookup.
Dave.