In response to your two points:
- malloc() / realloc() already follows a pattern. The only question is whether the pattern that it follows is effective enough for appending a single byte at a time. The Perl source code shows that if the malloc() that comes with Perl is used, sv_grow() (the internal function ultimately used to the memory area used to hold the string) determines whether a realloc() is necessary by calling malloced_size(), meaning that realloc() is partially inlined into sv_grow().
- The critical section of the assembly code generated for Linux is:
movl %eax, (%esp)
movl %ebx, 4(%esp)
testl %eax, %eax
cmpl $16777215, %ebx
It looks as if the chance that the loop is optimized away is actually not so great. This should have been obvious by the fact that it took a whole 10 seconds to complete. Counting to 16Mbytes is not hard for a P3 800 Mhz box and I would expect this to have taken less than 1/10th of a second. Just for fun, I replaced the "assert(realloc(...))" with ";" and the time to complete was 0.03s. Yes, I checked the assembly code to ensure that the loop was not optimized away. This shows that the cost of a realloc() is approximately 300 times that of an increment/compare/branch.
In terms of overhead, it is not quite this bad, as increment/compare/branch does not achieve the operation we are trying to perform. Since even an inlined strategy to second guess malloc() and pre-alloc by larger increments would require more code than a increment/compare/branch, the 'overhead' of realloc() may not be that significant at all for a decent implementation of realloc().
Also, as I need to mention again, implementations of realloc() that use mremap() can reallocate large (4096+ bytes) memory areas without any need to copy the data itself. Pages are re-addressed.