Nice chat, this is getting more interesting ...;-)
What pattern to follow when you realloc memory? There are different approaches, and the choice should be made according to the nature of your application, different application would show different expected pattern of memory usage, and thus you should have different solutions. There is no single solution/pattern that fits all situations.
In my original post, I never said it is the only pattern, that you should realloc by adding one fixed-size block each time, that is just one possible pattern, and it is just one example.
Your choice also largely depends on your strategy to trade off between speed and memory usage. If one cares speed so much, and does not care memory that much, he can just double memory size each time, as what Perl did for its hash. Again, this is just another example, not the only approach.
I looked at your c/c++ example, and believe there is a big chance that your for loop was optimized by the compiler. In that case, it does not demo the real performance of realloc.
malloc() / realloc() already follows a pattern. The only question is whether the pattern that it follows is effective enough for appending a single byte at a time. The Perl source code shows that if the malloc() that comes with Perl is used, sv_grow() (the internal function ultimately used to the memory area used to hold the string) determines whether a realloc() is necessary by calling malloced_size(), meaning that realloc() is partially inlined into sv_grow().
The critical section of the assembly code generated for Linux is:
It looks as if the chance that the loop is optimized away is actually not so great. This should have been obvious by the fact that it took a whole 10 seconds to complete. Counting to 16Mbytes is not hard for a P3 800 Mhz box and I would expect this to have taken less than 1/10th of a second. Just for fun, I replaced the "assert(realloc(...))" with ";" and the time to complete was 0.03s. Yes, I checked the assembly code to ensure that the loop was not optimized away. This shows that the cost of a realloc() is approximately 300 times that of an increment/compare/branch.
In terms of overhead, it is not quite this bad, as increment/compare/branch does not achieve the operation we are trying to perform. Since even an inlined strategy to second guess malloc() and pre-alloc by larger increments would require more code than a increment/compare/branch, the 'overhead' of realloc() may not be that significant at all for a decent implementation of realloc().
Also, as I need to mention again, implementations of realloc() that use mremap() can reallocate large (4096+ bytes) memory areas without any need to copy the data itself. Pages are re-addressed.
This is definitely the kind of discussion I would like to be part of it.
People are talking with solid facts, insightful thoughts, strong supporting data ... And also talk with respect to each other, at the same time, with respect to facts found.
The other thing I would be interested in, is the performance difference between using linked list and (dynamically growing) array.
By using linked list, you would call malloc once for each element, but no realloc is called; by using array, you would call malloc once at the beginning, and then call realloc each time when it grows. I used the two approaches from time to time, but never seriously measured them.
Also you might mix the two approaches, by using a linked list of sub-arrays. the size of each array is fixed, and we grow the data structure by growing the linked list, attaching more sub-arrays to the linked list.
With a dynamically growing array, growing it by a fixed amount has a O(1) amortized cost, with unpredictable performance on any append (you may allocate). Accessing any element is O(1). Finding the size is O(1). Splicing a new element into the interior is O(n).
A linked list has a guaranteed O(1) cost to grow. Splicing is also O(1). Accessing an arbitrary element is O(n). Finding the size is O(n). Splicing a new element into the interior is O(1).
Whether the linked list or array has more overhead is implementation and dataset dependent.
As for your mixed approach, you can do that and some have. Check out http://www.and.org/vstr/overview.html for an example of a string library that does. (I have not looked closely at it.)