It's generally regarded as being a "bit" slower, e.g. 10-20%. This is because each function takes an extra argument (the interpreter), data attached to ops has been moved into the pad (so is slightly slower to retrieve), and until recently, the malloc wrappers did a slow getting of the interpreter address on each call.
Of course, the actual slowdown will depend on the particular code and modules. For example, individual XS modules may be considerably slower if they haven't been written to take advantage of workarounds for certain compatibility slowdowns (e.g. PERL_NO_GET_CONTEXT). Until recently, DBI with DBD::mysql was 3 times slower under a threaded perl for a simple fetch() loop. It's now only 20% slower, having been reworked a bit.