http://www.perlmonks.org?node_id=241332


in reply to what's faster than .=

Perl strings are allocated using the malloc() C library routine. Perl strings are 'grown' using the realloc() C library routine. An append operation is a realloc() operation (to 'grow' the string) followed by a memcpy() to copy the additional data at the end (in your case, two bytes, the character, and a '\0' character that is kept at the end of every Perl string).

The malloc()/realloc() C library functions have platform specific restrictions that define minimal space allocation increments. For example, most platforms require data structures to be allocated on a 4 or 8 byte boundary. Since malloc() can be used to allocate any type including larger types such as 'double', malloc() will always round the size up to at *least* the next 4 or 8 byte boundary. This means that even if Perl allocated a string using malloc(1), malloc() would return a 4 or 8 byte memory area. Calls to realloc(2), or realloc(3) would end up being no-op operations as all of 1, 2, and 3 result in a minimal size of 4 or 8 bytes.

The malloc() and realloc() routines are usually also optimized to hold larger data structures. One of the more straight-forward implementations is for malloc() or realloc() to round allocation sizes up to the next power of 2. Therefore malloc(5) would return an 8 byte memory area, malloc(9) would return a 16 byte memory area, and malloc(17) would return a 32 byte memory area. realloc() will only need to copy bytes if the memory area cannot be enlarged to the next power of 2 without moving it.

For memory areas larger than one page of memory, it is not uncommon for malloc() and realloc() to be implemented using mmap() and mremap() (mremap() exists on at least modern versions of Linux). If this is the case, realloc() can increase the size of a 2 page (8192 bytes?) area to 4 pages, without copying the data, even if the page after the two pages is in use. The mremap() call can be implemented to re-address virtual pages meaning that although 128 Kbytes of data may appear to be 'moved' in virtual memory from one address to another, no memory copy ever needs to take place.

All of these details are highly implementation specific. The cost of a string append operation in Perl is entirely dependent on the efficiency of the realloc() C library routine that Perl was compiled to use. In general, the implementations are quite efficient. If you really care to understand the cost in terms of real numbers, play with the Benchmark module ('perldoc Benchmark') and do your own timings.

Replies are listed 'Best First'.
Re: Re: what's faster than .=
by pg (Canon) on Mar 08, 2003 at 06:04 UTC
    Wow, just to add a little bit more.

    Generally speaking, realloc costs a lot, and is slow. That's why lots of c programmers, when they call realloc, they always reallocate more than they need at the moment, could be couple humdreds times of what they need, so they can largely reduce the frequency they calling realloc.

    For example, if one needs to allocate 4 more bytes each time, and knows that there is a chance that he would come back and repeat this again and again, why not simply allocate 400 more bytes each time, so he can reduce the frequency of calling realloc by 99%.

    However even with this optimization, realloc might still cost you a lot. For example, you need to realloc 1,000,000 4 bytes, if you realloc 400 bytes a time, you still need to call realloc 10,000 times.

    HOWEVER, Perl string does not work in this way, Perl only allocates what you want at the moment, FORTUNATELY, you still can pre-allocate the space by yourself. Just do something like:
    $str = " " x 100000;

      I do not agree with your statement that "generally speaking, realloc() costs a lot, and is slow."

      I executed the following C program to verify your claim:

      #include <malloc.h> #include <assert.h> int main () { void *p; int i; assert((p = malloc(1)) != 0); for (i = 1; i < 16 * 1024 * 1024; i++) assert((p = realloc(p, i)) != 0); return 0; }

      The above code does malloc(1) and then executes realloc(i) once each for i = 1b .. 16Mb.

      Compiling the above program using GCC 3.2.1 on a Linux 2.4.20 box with an 800 Mhz P3 CPU and 128 Mbytes of SDRAM, the elapsed time is 10.8 seconds. (uses GLIBC)

      Compiling the above program using GCC 3.2.1 on Cygwin running on a WinXP box with a 1.2 Ghz AMD Athlon CPU and 256 Mbytes of SDRAM, the elapsed time is 2.3 seconds. (uses GLIBC)

      Now, some implementations of realloc() are slow. GLIBC happens not to be one of them. Any implementation of malloc()/realloc() that allocates in increments of 4 bytes is defficient from my perspective. Some sort of sophistication is necessary to decrease the need for copying as the cost of copying increases. As I mentioned before, one of the more straight forward approaches is to allocate blocks in powers of 2. This way, for a consistently growing memory block, copies are only performed half as often every time twice as much data must be copied, resulting in a net gain, as the copy itself is usually less expensive that the operation generating new data to populate the string.

      Also, under Linux (at least), the mremap() call allows pages to be re-addressed providing the ability to support zero-copy realloc() for memory areas that already have their own pages, or are the only memory area in use on the page.

        Nice chat, this is getting more interesting ...;-)

        Two points:

        1. What pattern to follow when you realloc memory? There are different approaches, and the choice should be made according to the nature of your application, different application would show different expected pattern of memory usage, and thus you should have different solutions. There is no single solution/pattern that fits all situations.

          In my original post, I never said it is the only pattern, that you should realloc by adding one fixed-size block each time, that is just one possible pattern, and it is just one example.

          Your choice also largely depends on your strategy to trade off between speed and memory usage. If one cares speed so much, and does not care memory that much, he can just double memory size each time, as what Perl did for its hash. Again, this is just another example, not the only approach.

        2. I looked at your c/c++ example, and believe there is a big chance that your for loop was optimized by the compiler. In that case, it does not demo the real performance of realloc.