|laziness, impatience, and hubris|
I do not agree with your statement that "generally speaking, realloc() costs a lot, and is slow."
I executed the following C program to verify your claim:
The above code does malloc(1) and then executes realloc(i) once each for i = 1b .. 16Mb.
Compiling the above program using GCC 3.2.1 on a Linux 2.4.20 box with an 800 Mhz P3 CPU and 128 Mbytes of SDRAM, the elapsed time is 10.8 seconds. (uses GLIBC)
Compiling the above program using GCC 3.2.1 on Cygwin running on a WinXP box with a 1.2 Ghz AMD Athlon CPU and 256 Mbytes of SDRAM, the elapsed time is 2.3 seconds. (uses GLIBC)
Now, some implementations of realloc() are slow. GLIBC happens not to be one of them. Any implementation of malloc()/realloc() that allocates in increments of 4 bytes is defficient from my perspective. Some sort of sophistication is necessary to decrease the need for copying as the cost of copying increases. As I mentioned before, one of the more straight forward approaches is to allocate blocks in powers of 2. This way, for a consistently growing memory block, copies are only performed half as often every time twice as much data must be copied, resulting in a net gain, as the copy itself is usually less expensive that the operation generating new data to populate the string.
Also, under Linux (at least), the mremap() call allows pages to be re-addressed providing the ability to support zero-copy realloc() for memory areas that already have their own pages, or are the only memory area in use on the page.