I agree. Many optimizations that can be performed in C or other no runtime eval languages can't be done in Perl since so much context and metadata has to be kept around for a rare eval string or magic to happen. Dereferencing has to be done every time -> or {} appears in code without exception in case its a magic variable. $root{l1}{l2}{l3} really takes 4 dereferences ops every time it is written, and it can be written 7 times in a sub. Very few people (I am one of them) will make a lexical reference to the has slice "\$root{l1}{l2}{l3}" to avoid all the reference ops. I still had a dereference op every time I write "${}" but its better than "load constant on stack, do deref" times 3 opcodes. Sometimes flexibility isn't so flexible. hv_common_key_len could use some refactoring to split out all the magic support into separate function calls to keep it out of the CPU cache. Here are top 18 (actually all functions over 4 KB long) fattest functions in ActivePerl 5.12 (Visual C -O1) in machine code.
_Perl_re_compile 00001021
_Perl_gv_fetchpvn_flags 0000105F
S_scan_const 000010EB
S_regatom 000012FB
S_make_trie 0000148C
_Perl_sv_vcatpvfn 00001620
_Perl_yyparse 000016C2
S_regclass 0000177A
_Perl_do_sv_dump 00001A4D
S_reg 00001B5F
_perl_clone_using 00001D3C
S_unpack_rec 0000231F
S_study_chunk 00002668
S_pack_rec 00002729
S_find_byclass 00002928
_Perl_keyword 0000359D
S_regmatch 0000438A
_Perl_yylex 00008095
un/pack I'm surprised it so fat. clone_using could use a couple strategically placed memcpy calls rather than 100s of double/quadword copies.
Use
B::Deparse and look at your opcode trees. Reduce number of opcodes and your code is faster. Each ; has overhead (line number switching for not present debugger). Use a comma operator sometimes to reduce nextstates.