Beefy Boxes and Bandwidth Generously Provided by pair Networks
No such thing as a small change
 
PerlMonks  

Re: Re: what's faster than .=

by pg (Canon)
on Mar 08, 2003 at 06:04 UTC ( #241346=note: print w/ replies, xml ) Need Help??


in reply to Re: what's faster than .=
in thread what's faster than .=

Wow, just to add a little bit more.

Generally speaking, realloc costs a lot, and is slow. That's why lots of c programmers, when they call realloc, they always reallocate more than they need at the moment, could be couple humdreds times of what they need, so they can largely reduce the frequency they calling realloc.

For example, if one needs to allocate 4 more bytes each time, and knows that there is a chance that he would come back and repeat this again and again, why not simply allocate 400 more bytes each time, so he can reduce the frequency of calling realloc by 99%.

However even with this optimization, realloc might still cost you a lot. For example, you need to realloc 1,000,000 4 bytes, if you realloc 400 bytes a time, you still need to call realloc 10,000 times.

HOWEVER, Perl string does not work in this way, Perl only allocates what you want at the moment, FORTUNATELY, you still can pre-allocate the space by yourself. Just do something like:

$str = " " x 100000;


Comment on Re: Re: what's faster than .=
Download Code
Re: Re: Re: what's faster than .=
by MarkM (Curate) on Mar 08, 2003 at 08:37 UTC

    I do not agree with your statement that "generally speaking, realloc() costs a lot, and is slow."

    I executed the following C program to verify your claim:

    #include <malloc.h> #include <assert.h> int main () { void *p; int i; assert((p = malloc(1)) != 0); for (i = 1; i < 16 * 1024 * 1024; i++) assert((p = realloc(p, i)) != 0); return 0; }

    The above code does malloc(1) and then executes realloc(i) once each for i = 1b .. 16Mb.

    Compiling the above program using GCC 3.2.1 on a Linux 2.4.20 box with an 800 Mhz P3 CPU and 128 Mbytes of SDRAM, the elapsed time is 10.8 seconds. (uses GLIBC)

    Compiling the above program using GCC 3.2.1 on Cygwin running on a WinXP box with a 1.2 Ghz AMD Athlon CPU and 256 Mbytes of SDRAM, the elapsed time is 2.3 seconds. (uses GLIBC)

    Now, some implementations of realloc() are slow. GLIBC happens not to be one of them. Any implementation of malloc()/realloc() that allocates in increments of 4 bytes is defficient from my perspective. Some sort of sophistication is necessary to decrease the need for copying as the cost of copying increases. As I mentioned before, one of the more straight forward approaches is to allocate blocks in powers of 2. This way, for a consistently growing memory block, copies are only performed half as often every time twice as much data must be copied, resulting in a net gain, as the copy itself is usually less expensive that the operation generating new data to populate the string.

    Also, under Linux (at least), the mremap() call allows pages to be re-addressed providing the ability to support zero-copy realloc() for memory areas that already have their own pages, or are the only memory area in use on the page.

      Nice chat, this is getting more interesting ...;-)

      Two points:

      1. What pattern to follow when you realloc memory? There are different approaches, and the choice should be made according to the nature of your application, different application would show different expected pattern of memory usage, and thus you should have different solutions. There is no single solution/pattern that fits all situations.

        In my original post, I never said it is the only pattern, that you should realloc by adding one fixed-size block each time, that is just one possible pattern, and it is just one example.

        Your choice also largely depends on your strategy to trade off between speed and memory usage. If one cares speed so much, and does not care memory that much, he can just double memory size each time, as what Perl did for its hash. Again, this is just another example, not the only approach.

      2. I looked at your c/c++ example, and believe there is a big chance that your for loop was optimized by the compiler. In that case, it does not demo the real performance of realloc.

        In response to your two points:

        1. malloc() / realloc() already follows a pattern. The only question is whether the pattern that it follows is effective enough for appending a single byte at a time. The Perl source code shows that if the malloc() that comes with Perl is used, sv_grow() (the internal function ultimately used to the memory area used to hold the string) determines whether a realloc() is necessary by calling malloced_size(), meaning that realloc() is partially inlined into sv_grow().
        2. The critical section of the assembly code generated for Linux is:
          .L10: movl %eax, (%esp) movl %ebx, 4(%esp) call realloc testl %eax, %eax je .L15 incl %ebx cmpl $16777215, %ebx

          It looks as if the chance that the loop is optimized away is actually not so great. This should have been obvious by the fact that it took a whole 10 seconds to complete. Counting to 16Mbytes is not hard for a P3 800 Mhz box and I would expect this to have taken less than 1/10th of a second. Just for fun, I replaced the "assert(realloc(...))" with ";" and the time to complete was 0.03s. Yes, I checked the assembly code to ensure that the loop was not optimized away. This shows that the cost of a realloc() is approximately 300 times that of an increment/compare/branch.

          In terms of overhead, it is not quite this bad, as increment/compare/branch does not achieve the operation we are trying to perform. Since even an inlined strategy to second guess malloc() and pre-alloc by larger increments would require more code than a increment/compare/branch, the 'overhead' of realloc() may not be that significant at all for a decent implementation of realloc().

          Also, as I need to mention again, implementations of realloc() that use mremap() can reallocate large (4096+ bytes) memory areas without any need to copy the data itself. Pages are re-addressed.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://241346]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others cooling their heels in the Monastery: (5)
As of 2014-07-23 02:07 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    My favorite superfluous repetitious redundant duplicative phrase is:









    Results (131 votes), past polls