http://www.perlmonks.org?node_id=241328

xafwodahs has asked for the wisdom of the Perl Monks concerning the following question:

I have code that 'collects' data 1 byte at a time. I append that byte to a scalar.

Pseudo-code:
while (something) { $byte = get_byte(); $all .= $byte; }
This can run for a couple hundred thousand bytes, and it occurred to me that ".=" might actually be very inefficient depending on how it works.

So,
1) Does .= result in lots of data copying as I suspect?

2) What is the most efficient way to append a byte to the end of a scalar?

Thanks for the help.

Replies are listed 'Best First'.
Re: what's faster than .=
by MarkM (Curate) on Mar 08, 2003 at 04:31 UTC

    Perl strings are allocated using the malloc() C library routine. Perl strings are 'grown' using the realloc() C library routine. An append operation is a realloc() operation (to 'grow' the string) followed by a memcpy() to copy the additional data at the end (in your case, two bytes, the character, and a '\0' character that is kept at the end of every Perl string).

    The malloc()/realloc() C library functions have platform specific restrictions that define minimal space allocation increments. For example, most platforms require data structures to be allocated on a 4 or 8 byte boundary. Since malloc() can be used to allocate any type including larger types such as 'double', malloc() will always round the size up to at *least* the next 4 or 8 byte boundary. This means that even if Perl allocated a string using malloc(1), malloc() would return a 4 or 8 byte memory area. Calls to realloc(2), or realloc(3) would end up being no-op operations as all of 1, 2, and 3 result in a minimal size of 4 or 8 bytes.

    The malloc() and realloc() routines are usually also optimized to hold larger data structures. One of the more straight-forward implementations is for malloc() or realloc() to round allocation sizes up to the next power of 2. Therefore malloc(5) would return an 8 byte memory area, malloc(9) would return a 16 byte memory area, and malloc(17) would return a 32 byte memory area. realloc() will only need to copy bytes if the memory area cannot be enlarged to the next power of 2 without moving it.

    For memory areas larger than one page of memory, it is not uncommon for malloc() and realloc() to be implemented using mmap() and mremap() (mremap() exists on at least modern versions of Linux). If this is the case, realloc() can increase the size of a 2 page (8192 bytes?) area to 4 pages, without copying the data, even if the page after the two pages is in use. The mremap() call can be implemented to re-address virtual pages meaning that although 128 Kbytes of data may appear to be 'moved' in virtual memory from one address to another, no memory copy ever needs to take place.

    All of these details are highly implementation specific. The cost of a string append operation in Perl is entirely dependent on the efficiency of the realloc() C library routine that Perl was compiled to use. In general, the implementations are quite efficient. If you really care to understand the cost in terms of real numbers, play with the Benchmark module ('perldoc Benchmark') and do your own timings.

      Wow, just to add a little bit more.

      Generally speaking, realloc costs a lot, and is slow. That's why lots of c programmers, when they call realloc, they always reallocate more than they need at the moment, could be couple humdreds times of what they need, so they can largely reduce the frequency they calling realloc.

      For example, if one needs to allocate 4 more bytes each time, and knows that there is a chance that he would come back and repeat this again and again, why not simply allocate 400 more bytes each time, so he can reduce the frequency of calling realloc by 99%.

      However even with this optimization, realloc might still cost you a lot. For example, you need to realloc 1,000,000 4 bytes, if you realloc 400 bytes a time, you still need to call realloc 10,000 times.

      HOWEVER, Perl string does not work in this way, Perl only allocates what you want at the moment, FORTUNATELY, you still can pre-allocate the space by yourself. Just do something like:
      $str = " " x 100000;

        I do not agree with your statement that "generally speaking, realloc() costs a lot, and is slow."

        I executed the following C program to verify your claim:

        #include <malloc.h> #include <assert.h> int main () { void *p; int i; assert((p = malloc(1)) != 0); for (i = 1; i < 16 * 1024 * 1024; i++) assert((p = realloc(p, i)) != 0); return 0; }

        The above code does malloc(1) and then executes realloc(i) once each for i = 1b .. 16Mb.

        Compiling the above program using GCC 3.2.1 on a Linux 2.4.20 box with an 800 Mhz P3 CPU and 128 Mbytes of SDRAM, the elapsed time is 10.8 seconds. (uses GLIBC)

        Compiling the above program using GCC 3.2.1 on Cygwin running on a WinXP box with a 1.2 Ghz AMD Athlon CPU and 256 Mbytes of SDRAM, the elapsed time is 2.3 seconds. (uses GLIBC)

        Now, some implementations of realloc() are slow. GLIBC happens not to be one of them. Any implementation of malloc()/realloc() that allocates in increments of 4 bytes is defficient from my perspective. Some sort of sophistication is necessary to decrease the need for copying as the cost of copying increases. As I mentioned before, one of the more straight forward approaches is to allocate blocks in powers of 2. This way, for a consistently growing memory block, copies are only performed half as often every time twice as much data must be copied, resulting in a net gain, as the copy itself is usually less expensive that the operation generating new data to populate the string.

        Also, under Linux (at least), the mremap() call allows pages to be re-addressed providing the ability to support zero-copy realloc() for memory areas that already have their own pages, or are the only memory area in use on the page.

Re: what's faster than .=
by hv (Prior) on Mar 08, 2003 at 04:40 UTC

    Devel::Peek is very useful for finding out this sort of low-level detail about perl's variables. When using it to dump a string, you'll see two values listed at the end - "CUR" is the current length of the actual string, and "LEN" is the size of the memory area currently reserved for it.

    A simple test of $string .= $nextchar shows that (with perl-5.8.0 on this platform at least) the LEN is changing in lockstep with CUR under this approach - perl is resizing the string buffer to be just big enough for the new string each time, so there is a lot of reallocing and therefore slow string copying going on.

    One simple workaround is to presize the buffer, which of course will work best if you've got a good idea how big the string is likely to get. Here's a benchmark to demonstrate that:

    use Benchmark; Benchmark::cmpthese(shift, { a => q{ $a[++$i] = ""; $a[$i] .= "a" for 1 .. 10000; }, b => q{ $b[++$j] = "b" x 10000; $b[$j] = ""; $b[$j] .= "b" for 1 .. 10000; }, });
    and the results:
    Benchmark: running a, b for at least 1 CPU seconds... a: 1 wallclock secs ( 1.03 usr + 0.01 sys = 1.04 CPU) @ 12 +4.04/s (n=129) b: 1 wallclock secs ( 1.08 usr + 0.01 sys = 1.09 CPU) @ 19 +1.74/s (n=209) Rate a b a 124/s -- -35% b 192/s 55% --

    Update: added "on this platform"

    Hugo
      Your post and testing results, made me thinking why the approach Perl uses to allocate memory for hash is totally different from what it does for string.

      My answer is that they had different expectations for string and hash.

      For string, most of the time, you don't expect it to grow that much. More importantly, even it grows, it does not indicate it will continue to grow.

      But hash is different, as a collection of elements, you expect it to grow all the time. More importantly, if you see some growth, it would be reasonable for you to expect more growth. To speed up, Perl simply assume that what happened would happen again, so let's double the memory allocated, when what has been allocated is all used up.

      It is all about expectation and analysis of behavior.
Re: what's faster than .=
by BrowserUk (Patriarch) on Mar 08, 2003 at 05:48 UTC

    Pre-allocating the string sounds like a good idea, but in order to make it work, you then need to ensure that you overwrite the pre-allocated contents, not add to or replace them.

    Two ways you might do this are substr and vec used as lvaues, but the reality (on my system:) is that using either is way slower than allowing perl to manage the process itself.

    C:\test>241328 Name "main::substr" used only once: possible typo at C:\test\241328.pl + line 6. Name "main::bytewise" used only once: possible typo at C:\test\241328. +pl line 5. Name "main::vec" used only once: possible typo at C:\test\241328.pl li +ne 7. Benchmark: running bytewise, substr, vec , each for at least 5 CPU seconds ... bytewise: 5 wallclock secs ( 5.17 usr + 0.01 sys = 5.18 CPU) @ 1 +.55/s (n=8) Use of uninitialized value in substr at (eval 6) line 1. substr: 6 wallclock secs ( 6.05 usr + 0.00 sys = 6.05 CPU) @ 0 +.83/s (n=5) vec: 6 wallclock secs ( 5.22 usr + 0.00 sys = 5.22 CPU) @ 0 +.96/s (n=5) s/iter substr vec bytewise substr 1.21 -- -14% -47% vec 1.04 16% -- -38% bytewise 0.647 87% 61% -- C:\test>

    I suspect that the best answer is to try and avoid producing your data 1 byte at a time. If you identified that process, there might be alternative ways of acheiving it more efficiently.


    Examine what is said, not who speaks.
    1) When a distinguished but elderly scientist states that something is possible, he is almost certainly right. When he states that something is impossible, he is very probably wrong.
    2) The only way of discovering the limits of the possible is to venture a little way past them into the impossible
    3) Any sufficiently advanced technology is indistinguishable from magic.
    Arthur C. Clarke.
      Pre-allocating the string sounds like a good idea, but in order to make it work, you then need to ensure that you overwrite the pre-allocated contents, not add to or replace them.

      That does not imply that you have to use substr or vec. Perl generally manages things for you as long as you use the same variable name. Part of the design of Perl was to not throw away information on the previous high-water mark when using a particular variable--even lexicals remember how much space they needed from the last time through, and assume you'll use that much again the next time through the loop or sub. Benchmarks on .= will produce much faster results the second time through, provided you don't recreate the actual variable.

        That's both interesting and intriguing. Is the behavior documented other than in the source? Is it reliable? How does Perl decide when to GC a subs lexicals and when not to? How does it know when a sub might be called again?


        Examine what is said, not who speaks.
        1) When a distinguished but elderly scientist states that something is possible, he is almost certainly right. When he states that something is impossible, he is very probably wrong.
        2) The only way of discovering the limits of the possible is to venture a little way past them into the impossible
        3) Any sufficiently advanced technology is indistinguishable from magic.
        Arthur C. Clarke.
Re: what's faster than .=
by Aristotle (Chancellor) on Mar 08, 2003 at 13:50 UTC

    Perl is a pretty high level language. It’s optimized to handle cases like the one you’re referring to with decent efficiency, but makes little provision for low level techniques. foreach(LIST){$_} is faster than for($i=0;$i++;$i<@arr){$arr[$i]}; writing your own code to sort data is pretty much doomed to be slower than sort; goto is very slow, function calls aren’t nearly as much; the list goes on.

    The fastest way to do something in Perl is frequently the one that implements the most costly step in the fewest ops. That’s why a Guttman-Rosler transform is faster than a straight Schwartzian transform for complex records, f.ex – because the costly operation is the sort, and the GRT does not burden that one with callback.

    It is therefore no surprise that BrowserUk’s tests with substr failed to improve performance. Perl’s process of locating a specific byte in a scalar’s long string – where scalars are a multiply indirect structure to allow seamless growth –, creating an lvalue for it, and then assigning to that lvalue is far more involved than C’s simple process of doing some pointer math and dereferencing the result.

    Perl is not C. The most efficient way of doing things in Perl is almost always to let the perl interpreter do as much of the job as possible, rather than spelling it out.

    Makeshifts last the longest.

      creating an lvalue for it and assigning to that lvalue is a far more involved than C's simple process of doing some pointer math

      Agreed. In part, that's why I hoped (forelornly it seems) that Perl 6 would allow the syntax $string[n] as an lvalue or rvalue to mean the same as substr($string, n, 1) once that syntax no longer conflicts with the @arrayn syntax.

      From my best guess, and I realise that it is only a guess, the compiler should have enough information at compile-time to convert $string[n] =  'c' or even $string[n..m] into a direct read-from or write-to the scalar's storage without the need to generate a perl lvalue structure unless a reference is being taken.

      Avoiding both the call overhead of substr and the need for generating the whole lvalue and associated paraphanalia, unless it is necessary, would allow bytewise manipulation of strings with significantly less overhead than is currently the case.

      It is possible that this lack of direct access at the byte level of strings that allows perl to accomodate changes like unicode with almost no changes to the visible api's which is nice.

      Perl is not C. The most efficient way of doing things in Perl is almost always to let the perl interpreter do as much of the job as possible, rather than spelling it out.

      Perl's high-level constructs do obviate the need to do many things at the byte-wise level, but will never obviate them all.


      Examine what is said, not who speaks.
      1) When a distinguished but elderly scientist states that something is possible, he is almost certainly right. When he states that something is impossible, he is very probably wrong.
      2) The only way of discovering the limits of the possible is to venture a little way past them into the impossible
      3) Any sufficiently advanced technology is indistinguishable from magic.
      Arthur C. Clarke.
Re: what's faster than .=
by crenz (Priest) on Mar 08, 2003 at 13:32 UTC

    If you really need to shave off some speed, you might want to rethink your approach. I know neither your underlying data source nor your protocol, but you should check whether you really can only get one byte at a time, since you seem to be processing the data in blocks anyway.

    A common approach is to read a block of n bytes from the serial port/keyboard/.... If the routine you use to read is blocking (ie., it won't return until it has collected n bytes), you can use a timeout to make sure you don't wait forever.

Re: what's faster than .=
by gmpassos (Priest) on Mar 08, 2003 at 22:59 UTC
    Well, between $var .= $data and $var = $var . $data use the ".=", much better, since the second always rewrite all the data, and the first just append.

    I don't know if you are getting this for a file or socket, but if you are, use

    read(SOCKET, $var , 1 , length($var) )
    and read() will append the data faster than use a buffer var to than write to it!

    But think, you really need to read byte by byte?! If you are doing this because you are using socket, take a look in IO::Select, can help.

    Graciliano M. P.
    "The creativity is the expression of the liberty".

Re: what's faster than .=
by Weathros (Novice) on Mar 08, 2003 at 18:36 UTC
    Copying byte by byte is not a good idea. Imagine the PCI-bus working just as much on one byte as 65k bytes ;) (Depends on OS, PCI, etc)

    I'm not sure how relevant it is in this particulare case, but you can preallocate hashes with "keys(%HASH) = 100;".

    If you really have to work with one byte at a time, you should atleast consider reserving the memory you'll be using and writing directly to a memory address; Maybe embed some C/C++ code to do that particiular task.

    address++ = byte;

    My 0.002 cents
Re: what's faster than .=
by Anonymous Monk on Mar 08, 2003 at 21:21 UTC
    In Perl this algorithm is fine. The amortized cost of building up a string of length n in Perl is O(n) - a better approach can give you a constant factor improvement but nothing major.

    Also .= should be the fastest way to append a byte to a scalar in native Perl. If it isn't, then it is not far from it. (I would regard anything else as a bug in Perl.)

    However you are right to be concerned about the potential performance. In any of Python, Ruby, JavaScript, or Java your approach would scale as O(n*n) because their equivalents to .= actually create new strings rather than modifying an existing one in place. If you look around in these languages there is often an alternate data structure which is similar to a String, but which is designed for repeated appends. For instance in Java use a StringBuffer and .append to it. As a worst case you can actually write such a data structure. For a simple approach, use an array and then join it together when you need a string. (Simple, but you can do better - however that is OT.)

Re: what's faster than .=
by runrig (Abbot) on Mar 08, 2003 at 16:31 UTC
    I don't have time to Benchmark it at the moment, but you might also try print'ing a character at a time to a temp file.

    Update: my benchark code shows that printing is 20-25% faster than appending to a variable, unless the variable is preallocated to the desired length. Then appending is 4-5 times faster. YMMV.