Re^2: Perl 5 Optimizing Compiler, Part 2

Even “traditional” C++ compilers use that technique...

What?

JIT compiling is also an affordable one-time overhead expense.

Not the tracing technique that's currently fashionable! Writing XS to do the same thing without the overhead of op dispatch won't optimize something that's slow because it uses more memory than necessary to provide more flexibility than necessary, especially if running that XS code adds a language barrier that you can't optimize across and which requires serialization and deserialization (or at least prevents you from using non-SVs).

In Perl et al, the hot-spots once identified can be spun off into XS subroutines.

In many cases that won't help.

If this were not so, then Perl, Python, Java, dot-Net, PHP and so-on would never have been done this way....

I'm sorry, but that's a non sequitur.

Comment on Re^2: Perl 5 Optimizing Compiler, Part 2

Replies are listed 'Best First'.
Re^3: Perl 5 Optimizing Compiler, Part 2 by bulk88 (Priest) on Aug 21, 2012 at 01:44 UTC
Even “traditional” C++ compilers use that technique... What? Sundial is being vague again. I think he is talking about that all C compilers released in the last 20 have a "front end/back end" design with a non machine code bytecode in the middle. Or Sundial is talking about C++ string classes designed to sell new PCs since they perform a full heap walk/validation on each string catting to stop evil hackers and phishers. Not the tracing technique that's currently fashionable! Writing XS to do the same thing without the overhead of op dispatch won't optimize something that's slow because it uses more memory than necessary to provide more flexibility than necessary, especially if running that XS code adds a language barrier that you can't optimize across and which requires serialization and deserialization (or at least prevents you from using non-SVs). In Perl et al, the hot-spots once identified can be spun off into XS subroutines. In many cases that won't help. Turning perl opcodes into one XS always results in a faster runtime even if the data moved as SVs between C functions under 1 XS function. Mark, context and tmps stack swaps, pushmarks, PAD accessing, GV deferencing, wantarray context checks, are eliminated. Machine opcodes sit in RO memory rather than RW memory such as Perl opcodes so the CPU has more opportunity to optimize. In runops the next perl opcode (and next machine code function) can't be predicted since it sits in RW memory or the return register in the callee. For heavy cpu operations such as a regexp, or IO, of course there is no difference between XS and Perl bytecode. Very poorly written XS/C code (macro and inline abuse, and dumb compilers that don't merge character identical branch blocks together with jmps (Visual C cough cough)) can take more memory than the equivalent Perl bytecode. A C compiler can produce jumptables, the Perl Compiler doesn't produce jumptables, although I have seen some actual implementations of jumptables on PerlMonks that didn't result in a tree of conditional opcodes, IIRC it used goto. Perl has no C preprocessor although it has constant folding branch elimination that sort of is the same (although I think I'm the only one in the world who intentionally uses that). Perl encourages strings in general rather than numeric constants for settings and hash key names. I didn't research this, but I dont think Perl has any optimized to an array/AV implementation of restricted hashes (cough cough structs). As other have said, Perl's flexibility is its performance problems. If there is one candidate in Perl's standard library I would rewrite in XS, it is Exporter. 99.999% of Perl Modules use it. A distant 2nd is the pure perl portions of Dynaloader. I dont think there is anything else that deserves a rewrite in XS that would benefit the whole Perl community. Introducing new opcodes and reducing the opcode count by stuff more metadata into them (upto a couple bits of a pad offset, if offset is 1111 out of 4 bits, look for it on the "legacy" Perl stack), stuff GV's const char names into the de-gv opcode, not put a mark and a const string SV on the Perl stack. Desktop OS kernels don't allow time slices and inter thread signaling fine enough for automatic parallelism (list assign to list for example), plus perl's magic and tied monkeywrench would cause races. Writing a user mode inter thread synchronization and parallelism system with busy waiting on secondary CPUs is beyond the scope of the Perl project.	[reply]
Re^4: Perl 5 Optimizing Compiler, Part 2 by chromatic (Archbishop) on Aug 21, 2012 at 05:39 UTC
If there is one candidate in Perl's standard library I would rewrite in XS, it is Exporter. Common subexpression elimination would help my code more; I'm not that concerned about startup time anywhere but tests.	[reply]


Don't ask to ask, just ask
	PerlMonks