I forgot to reply to this specific point:
Me: Note also that LLVM is very unlikely to be able to able to optimise away any of the get_context() calls in the XS code.
You: Why? If they can be #defined away, why can they not be optimised away?
get_context() requires (on threaded builds) to retrieve a value from thread-local storage.
This requires (at least the last time I looked in the linux pthreads library) the current stack pointer to be compared against the stack range allocated to each current thread, to determine which thread we are. I'd be amazed if LLVM could determine that the value won't change between calls!
PERL_NO_GET_CONTEXT causes the value (the address of the current perl interpreter) to be passed as an extra argument to each XS function, rather than calling get_context() every time a function in the perl library needs to be called.