The critical section of the assembly code generated for Linux is:
.L10:
movl %eax, (%esp)
movl %ebx, 4(%esp)
call realloc
testl %eax, %eax
je .L15
incl %ebx
cmpl $16777215, %ebx
It looks as if the chance that the loop is optimized away is actually not so great. This should have been obvious by the fact that it took a whole 10 seconds to complete. Counting to 16Mbytes is not hard for a P3 800 Mhz box and I would expect this to have taken less than 1/10th of a second. Just for fun, I replaced the "assert(realloc(...))" with ";" and the time to complete was 0.03s. Yes, I checked the assembly code to ensure that the loop was not optimized away. This shows that the cost of a realloc() is approximately 300 times that of an increment/compare/branch.
In terms of overhead, it is not quite this bad, as increment/compare/branch does not achieve the operation we are trying to perform. Since even an inlined strategy to second guess malloc() and pre-alloc by larger increments would require more code than a increment/compare/branch, the 'overhead' of realloc() may not be that significant at all for a decent implementation of realloc().
Also, as I need to mention again, implementations of realloc() that use mremap() can reallocate large (4096+ bytes) memory areas without any need to copy the data itself. Pages are re-addressed.