|Perl: the Markov chain saw|
Re^6: Perl 5 Optimizing Compiler, Part 4: LLVM Backend?by dave_the_m (Parson)
|on Aug 29, 2012 at 09:22 UTC||Need Help??|
Perl internally parses the code into a tree of OP structures (the "optree"). This is sort of like an AST without the "A" part. But by its nature it's never been designed to be easily manipulable in the way an AST is supposed to be. Also, the structure and specification of the parsed program isn't contained within the optree, its also spread out among stashes, globs, pads etc.
When runtime is reached, each OP contains a pointer to the next op in execution sequence (op_next), plus a pointer to a C function that can carry out the action of that op (op_ppaddr). The main execution loop of perl (the "runloop") consists of calling the op_ppaddr function of the current op, then setting the current op to be whatever that function returns; repeat until a NULL is returned.
OPs (and the pp* functions which implement them) are unlike bytecode: bytecode consists of small, lightweight ops, with the expectation that they can can be easily converted into equivalent native machine code via JIT etc. Or to but it another way, they don't contain much switching logic themselves: switching is done by adding in switching bytecodes. A tracing JIT can then follow what paths are taken through the bytecode, and pick a certain path (no switching), and convert that sequence of bytecode into a sequence of machine instructions.
Perl ops are heavyweight: within each one there may be a lot of switching action. For example, the perl "add" op examines its two args: checks if they're overloaded, or tied, or have other magic associated with them, and if so handles that. Then checks whether the args are strings or other non-num things, and if so numifies them. Then adds them. Then checks for overflow: if overflow has occurred, see if the overflow could be avoided by upgrading from integer to float, and if so, do so.
(update: so what I meant to say at this point is that the perl optree is a bastard hybrid of an AST and bytecode, and has to serve both puprposes, not always comfortably)
B::Bytecode is just a way to serialise the optree (and associated state) into a platform neutral file. To execute it, the file is read in, and used to reconstruct the optree and state, then execution continues as before. It provides no runtime speedup, but was intended to speed startup by skipping the compilation phase: but in practice few gains were seen.