in reply to Strawberry: Both IO and global match are VERY SLOW. Unless pre-heated (but why?)

A large part of the growth in time cost is the way the tmps stack was being grown.

On Linux, at least some BSDs, realloc() for large allocations is handled through munmap() system call, making such allocations close to constant time. On Windows realloc() is O(size of original allocation) when the allocation can't be grown in place, since the data needs to be copied from the original allocation to the new allocation.

Before fea90cfbe1f221d50be90ca5ceb0c6c7f121e442 the tmps (or mortal) stack was being grown linearly, adding 512 entries each time the stack was grown. This dominated the execution cost for the match and io(list) no-heat cases.

With fea90cfbe1f221d50be90ca5ceb0c6c7f121e442 the tmps stack is now grown exponentially (1.5**n) like other similar allocations such as the value stack, arrays. This changes the cost to amortized constant time.

The pre-heat in each case grew the tmps stack in one big allocation, preventing the incremental growth that caused the problem.