Beefy Boxes and Bandwidth Generously Provided by pair Networks
P is for Practical

Re: Re: Re: what's faster than .=

by MarkM (Curate)
on Mar 08, 2003 at 08:37 UTC ( #241356=note: print w/replies, xml ) Need Help??

in reply to Re: Re: what's faster than .=
in thread what's faster than .=

I do not agree with your statement that "generally speaking, realloc() costs a lot, and is slow."

I executed the following C program to verify your claim:

#include <malloc.h> #include <assert.h> int main () { void *p; int i; assert((p = malloc(1)) != 0); for (i = 1; i < 16 * 1024 * 1024; i++) assert((p = realloc(p, i)) != 0); return 0; }

The above code does malloc(1) and then executes realloc(i) once each for i = 1b .. 16Mb.

Compiling the above program using GCC 3.2.1 on a Linux 2.4.20 box with an 800 Mhz P3 CPU and 128 Mbytes of SDRAM, the elapsed time is 10.8 seconds. (uses GLIBC)

Compiling the above program using GCC 3.2.1 on Cygwin running on a WinXP box with a 1.2 Ghz AMD Athlon CPU and 256 Mbytes of SDRAM, the elapsed time is 2.3 seconds. (uses GLIBC)

Now, some implementations of realloc() are slow. GLIBC happens not to be one of them. Any implementation of malloc()/realloc() that allocates in increments of 4 bytes is defficient from my perspective. Some sort of sophistication is necessary to decrease the need for copying as the cost of copying increases. As I mentioned before, one of the more straight forward approaches is to allocate blocks in powers of 2. This way, for a consistently growing memory block, copies are only performed half as often every time twice as much data must be copied, resulting in a net gain, as the copy itself is usually less expensive that the operation generating new data to populate the string.

Also, under Linux (at least), the mremap() call allows pages to be re-addressed providing the ability to support zero-copy realloc() for memory areas that already have their own pages, or are the only memory area in use on the page.

Replies are listed 'Best First'.
Re: Re: Re: Re: what's faster than .=
by pg (Canon) on Mar 08, 2003 at 16:31 UTC
    Nice chat, this is getting more interesting ...;-)

    Two points:

    1. What pattern to follow when you realloc memory? There are different approaches, and the choice should be made according to the nature of your application, different application would show different expected pattern of memory usage, and thus you should have different solutions. There is no single solution/pattern that fits all situations.

      In my original post, I never said it is the only pattern, that you should realloc by adding one fixed-size block each time, that is just one possible pattern, and it is just one example.

      Your choice also largely depends on your strategy to trade off between speed and memory usage. If one cares speed so much, and does not care memory that much, he can just double memory size each time, as what Perl did for its hash. Again, this is just another example, not the only approach.

    2. I looked at your c/c++ example, and believe there is a big chance that your for loop was optimized by the compiler. In that case, it does not demo the real performance of realloc.

      In response to your two points:

      1. malloc() / realloc() already follows a pattern. The only question is whether the pattern that it follows is effective enough for appending a single byte at a time. The Perl source code shows that if the malloc() that comes with Perl is used, sv_grow() (the internal function ultimately used to the memory area used to hold the string) determines whether a realloc() is necessary by calling malloced_size(), meaning that realloc() is partially inlined into sv_grow().
      2. The critical section of the assembly code generated for Linux is:
        .L10: movl %eax, (%esp) movl %ebx, 4(%esp) call realloc testl %eax, %eax je .L15 incl %ebx cmpl $16777215, %ebx

        It looks as if the chance that the loop is optimized away is actually not so great. This should have been obvious by the fact that it took a whole 10 seconds to complete. Counting to 16Mbytes is not hard for a P3 800 Mhz box and I would expect this to have taken less than 1/10th of a second. Just for fun, I replaced the "assert(realloc(...))" with ";" and the time to complete was 0.03s. Yes, I checked the assembly code to ensure that the loop was not optimized away. This shows that the cost of a realloc() is approximately 300 times that of an increment/compare/branch.

        In terms of overhead, it is not quite this bad, as increment/compare/branch does not achieve the operation we are trying to perform. Since even an inlined strategy to second guess malloc() and pre-alloc by larger increments would require more code than a increment/compare/branch, the 'overhead' of realloc() may not be that significant at all for a decent implementation of realloc().

        Also, as I need to mention again, implementations of realloc() that use mremap() can reallocate large (4096+ bytes) memory areas without any need to copy the data itself. Pages are re-addressed.

        This is definitely the kind of discussion I would like to be part of it.

        People are talking with solid facts, insightful thoughts, strong supporting data ... And also talk with respect to each other, at the same time, with respect to facts found.

        The other thing I would be interested in, is the performance difference between using linked list and (dynamically growing) array.

        By using linked list, you would call malloc once for each element, but no realloc is called; by using array, you would call malloc once at the beginning, and then call realloc each time when it grows. I used the two approaches from time to time, but never seriously measured them.

        Also you might mix the two approaches, by using a linked list of sub-arrays. the size of each array is fixed, and we grow the data structure by growing the linked list, attaching more sub-arrays to the linked list.

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://241356]
and all is quiet...

How do I use this? | Other CB clients
Other Users?
Others studying the Monastery: (3)
As of 2018-05-20 16:50 GMT
Find Nodes?
    Voting Booth?