Beefy Boxes and Bandwidth Generously Provided by pair Networks
No such thing as a small change
 
PerlMonks  

comment on

( [id://3333]=superdoc: print w/replies, xml ) Need Help??

Putting Perl aside for a moment, there are approximately four ways to grow a string:

  • Allocate more space than anyone could ever want, and hope for the best. Null-terminate, or keep a pointer to the end. This is fragile at best, and is the source of security problems caused by buffer overruns. Perl doesn't do this.
  • Allocate just enough space. If the string grows, allocate a new buffer and move everything to the larger buffer. This becomes quite inefficient, and is probably the "schmliel the painter" algorithm to which you refer.
  • Allocate just enough space and store a pointer to it. Concatenation becomes the logical operation of allocating just enough space again, and adding another pointer to the list of discontiguous memory blocks. But discontiguous storage is complicated, and doesn't benefit from the efficiencies of accessing contiguous blocks of memory.
  • Allocate a little extra space so that the string can grow a little without the "allocate and copy" dance. Every time the string's growth runs out of buffer space, allocate a new buffer with room for the concatenated string, plus extra space.

The last one is what Perl does. By allocating extra space, the "allocate and copy" inefficiencies are mitigated. In brief searching I haven't found the definitive source on how Perl approaches this, but my recollection is this:

  1. For strings under 256 bytes, allocate a 256 byte buffer.
  2. When a string grows to exceed buffer space, jump to 1kb and copy into new buffer.
  3. When a string grows to exceed the 1kb, allocate 2kb, and copy everything over.
  4. Repeat, doubling each time.

Moving contiguous blocks of memory around is a pretty quick operation for most processors. The key is to avoid doing it too often. The other key is to not waste too much memory by allocating way too much extra. Just where Perl sets the transition points I don't recall exactly.

The process does grow in time complexity as the string's growth continues doubling. But it doesn't (ideally) happen on each iteration of growth. I believe that people suggest that concatenation occurs in amortized constant time. In other words, any given iteration may be expensive, but the bulk of the iterations will be cheap, to the point that it's as good as constant time in the aggregate. The same principle applies to hash insertions; constant amortized time, even though an individual insertion may be relatively expensive.

Ruby probably uses the same strategy as Perl. There really isn't any better strategy. The generalized form of the algorithm also applies to Perl's arrays, C++'s std::vector container, C++'s std::string data type, and so on. It just keeps popping up because it's so useful. Details of threshold points and other optimizations may change, but the underlying algorithm is mostly the same.


Dave


In reply to Re: non schlemiel painter string concat by davido
in thread non schlemiel painter string concat by Anonymous Monk

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post; it's "PerlMonks-approved HTML":



  • Are you posting in the right place? Check out Where do I post X? to know for sure.
  • Posts may use any of the Perl Monks Approved HTML tags. Currently these include the following:
    <code> <a> <b> <big> <blockquote> <br /> <dd> <dl> <dt> <em> <font> <h1> <h2> <h3> <h4> <h5> <h6> <hr /> <i> <li> <nbsp> <ol> <p> <small> <strike> <strong> <sub> <sup> <table> <td> <th> <tr> <tt> <u> <ul>
  • Snippets of code should be wrapped in <code> tags not <pre> tags. In fact, <pre> tags should generally be avoided. If they must be used, extreme care should be taken to ensure that their contents do not have long lines (<70 chars), in order to prevent horizontal scrolling (and possible janitor intervention).
  • Want more info? How to link or How to display code and escape characters are good places to start.
Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others perusing the Monastery: (8)
As of 2024-04-23 10:45 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found