Beefy Boxes and Bandwidth Generously Provided by pair Networks
Syntactic Confectionery Delight

Re: Re: Re: Re: what's faster than .=

by pg (Canon)
on Mar 08, 2003 at 16:31 UTC ( #241399=note: print w/replies, xml ) Need Help??

in reply to Re: Re: Re: what's faster than .=
in thread what's faster than .=

Nice chat, this is getting more interesting ...;-)

Two points:

  1. What pattern to follow when you realloc memory? There are different approaches, and the choice should be made according to the nature of your application, different application would show different expected pattern of memory usage, and thus you should have different solutions. There is no single solution/pattern that fits all situations.

    In my original post, I never said it is the only pattern, that you should realloc by adding one fixed-size block each time, that is just one possible pattern, and it is just one example.

    Your choice also largely depends on your strategy to trade off between speed and memory usage. If one cares speed so much, and does not care memory that much, he can just double memory size each time, as what Perl did for its hash. Again, this is just another example, not the only approach.

  2. I looked at your c/c++ example, and believe there is a big chance that your for loop was optimized by the compiler. In that case, it does not demo the real performance of realloc.

Replies are listed 'Best First'.
Re: Re: Re: Re: Re: what's faster than .=
by MarkM (Curate) on Mar 08, 2003 at 18:42 UTC

    In response to your two points:

    1. malloc() / realloc() already follows a pattern. The only question is whether the pattern that it follows is effective enough for appending a single byte at a time. The Perl source code shows that if the malloc() that comes with Perl is used, sv_grow() (the internal function ultimately used to the memory area used to hold the string) determines whether a realloc() is necessary by calling malloced_size(), meaning that realloc() is partially inlined into sv_grow().
    2. The critical section of the assembly code generated for Linux is:
      .L10: movl %eax, (%esp) movl %ebx, 4(%esp) call realloc testl %eax, %eax je .L15 incl %ebx cmpl $16777215, %ebx

      It looks as if the chance that the loop is optimized away is actually not so great. This should have been obvious by the fact that it took a whole 10 seconds to complete. Counting to 16Mbytes is not hard for a P3 800 Mhz box and I would expect this to have taken less than 1/10th of a second. Just for fun, I replaced the "assert(realloc(...))" with ";" and the time to complete was 0.03s. Yes, I checked the assembly code to ensure that the loop was not optimized away. This shows that the cost of a realloc() is approximately 300 times that of an increment/compare/branch.

      In terms of overhead, it is not quite this bad, as increment/compare/branch does not achieve the operation we are trying to perform. Since even an inlined strategy to second guess malloc() and pre-alloc by larger increments would require more code than a increment/compare/branch, the 'overhead' of realloc() may not be that significant at all for a decent implementation of realloc().

      Also, as I need to mention again, implementations of realloc() that use mremap() can reallocate large (4096+ bytes) memory areas without any need to copy the data itself. Pages are re-addressed.

      This is definitely the kind of discussion I would like to be part of it.

      People are talking with solid facts, insightful thoughts, strong supporting data ... And also talk with respect to each other, at the same time, with respect to facts found.

      The other thing I would be interested in, is the performance difference between using linked list and (dynamically growing) array.

      By using linked list, you would call malloc once for each element, but no realloc is called; by using array, you would call malloc once at the beginning, and then call realloc each time when it grows. I used the two approaches from time to time, but never seriously measured them.

      Also you might mix the two approaches, by using a linked list of sub-arrays. the size of each array is fixed, and we grow the data structure by growing the linked list, attaching more sub-arrays to the linked list.
        With a dynamically growing array, growing it by a fixed amount has a O(1) amortized cost, with unpredictable performance on any append (you may allocate). Accessing any element is O(1). Finding the size is O(1). Splicing a new element into the interior is O(n).

        A linked list has a guaranteed O(1) cost to grow. Splicing is also O(1). Accessing an arbitrary element is O(n). Finding the size is O(n). Splicing a new element into the interior is O(1).

        Whether the linked list or array has more overhead is implementation and dataset dependent.

        As for your mixed approach, you can do that and some have. Check out for an example of a string library that does. (I have not looked closely at it.)

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://241399]
and all is quiet...

How do I use this? | Other CB clients
Other Users?
Others pondering the Monastery: (3)
As of 2018-01-23 01:04 GMT
Find Nodes?
    Voting Booth?
    How did you see in the new year?

    Results (238 votes). Check out past polls.