Beefy Boxes and Bandwidth Generously Provided by pair Networks
Just another Perl shrine

Pre-grow a string

by diotalevi (Canon)
on Aug 02, 2007 at 16:22 UTC ( #630323=perlquestion: print w/replies, xml ) Need Help??
diotalevi has asked for the wisdom of the Perl Monks concerning the following question:

I'm appending to a string four bytes a time. Almost everything is 2**19 bytes long or shorter. I'd like to pre-grow my string so perl doesn't have to reallocate it each time I grow the string larger than 2**10 .. 2**18. I can never seem to remember the trick for this so I'm penning it as a SoPW. Help me Obi-wan, you're my only hope.

grow( my( $str ), 2**19 ); while ( ... ) { $str .= pack 'J', ...; # append four bytes at a time. }

⠤⠤ ⠙⠊⠕⠞⠁⠇⠑⠧⠊

Replies are listed 'Best First'.
Re: Pre-grow a string
by BrowserUk (Pope) on Aug 02, 2007 at 17:24 UTC

    The most efficient way I know of is to use a ramfile and seek (Note: the space allocated is uninitialised).

    The nice thing about this is that if you need to extend it, you can, and any existing contents will be preserved even if Perl has to re-allocate a larger chunk of ram.

    #! perl -slw use strict; my $bigstring; open my $temp, '>', \$bigstring or die $!; print $temp 'My oh so unique signature'; seek $temp, 2**19, 0; print $temp chr(0); close $temp; print length $bigstring; open $temp, '+<', \$bigstring or die $!; seek $temp, 2**20, 0; print $temp chr(0); close $temp; print length $bigstring; print substr $bigstring, 0, 25; __END__ c:\test>junk6 524290 1048578 My oh so unique signature

    But then I've never found a way to truncate that (so that I can append to it), without risking having the space returned to the pool. So, I've used substr as an lvalue with a pointer to manage the space myself:

    my $p = 0; substr( $bigstring, $p, 4 ) = pack 'J', .. $p += 4;

    Two caveats are: 1) truncate doesn't seem to work on ramfiles; 2) if you over allocate, the unused space remains as a part of the string until you do something about it. Eg.

    substr( $bigstring, $p+1 ) = '';

    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.
Re: Pre-grow a string (MTOW)
by tye (Sage) on Aug 02, 2007 at 16:59 UTC

    The somewhat sucky way is:

    my $string= ' ' x 2**19; $string= '';

    It'd be cool if the following worked but it doesn't:

    length( $string )= 2**19; # Doesn't work

    Or you can get creative:

    open NUL, "<", File::Spec->devnull(); sysread( NUL, $string, 2**19-length($string), length($string) ); close NUL;

    Testing in my environment shows that you can even get away with:

    sysread( STDOUT, $string, 2**19 );

    which leaves $string's current value unchanged while extending the storage allocated to it. Of course, there is a risk of running into a system where STDOUT is open for both read and write access. I wonder if there is another Perl built-in that reads into a buffer that can be used more conveniently.

    Update: Testing also shows the following works well:

    sub grow { sysread( DATA, $_[0], $_[1] ); } __DATA__

    - tye        

      my $string= ' ' x 2**19; $string= '';

      I dimly remember having seen this being suggested somewhere else... However it doesn't actually seem to result in speeding up things (if that's the idea behind avoiding reallocations). On my machine and version of Perl it's in fact marginally slower.

      use Benchmark "cmpthese"; sub pre { my $str = " " x 2**19; $str = ""; $str .= "aaaa" for 1..2**17; } sub std { my $str = ""; $str .= "aaaa" for 1..2**17; } cmpthese( -1, { 'pre' => 'pre()', 'std' => 'std()' });


      Rate pre std pre 43.6/s -- -7% std 46.7/s 7% --

        Yes, what is somewhat sucky about it is that it allocates the space twice, initializes one copy then copies it (then frees one); though I haven't checked this assumption of mine. Which means iterating over the requested size twice. Pre-sizing avoids reallocating. Reallocating is done exponentially (doubling the size each time) and so averages out to copying the maximum size about twice (unless you get lucky and reallocate in-place, rather unlikely in my experience). So it isn't surprising that the performance difference is minimal between those two choices.

        But note that pre-allocating will likely reduce a bit of memory fragmentation by avoiding leaving those power-of-two-sized buffers of now-free space in its wake.

        - tye        

Re: Pre-grow a string
by ikegami (Pope) on Aug 02, 2007 at 16:50 UTC

    For new SVs, macro NEWSV and function newSV.
    For existing SVs, SvGROW and function sv_grow.
    See perlapi.

    Update: I can't find an existing interface to them. You could use Inline::C or write a simple XS module.

Re: Pre-grow a string
by whereiskurt (Friar) on Aug 02, 2007 at 16:56 UTC

    In an effort to answer your question (which I haven't) I learned that:

    keys(%users) = 1000; # allocate 1024 buckets

    Will preallocate for hashes. I know - useless for you.

    Just thought I'd share anyway. :)


Re: Pre-grow a string
by ysth (Canon) on Aug 03, 2007 at 00:22 UTC
    my $str = ""; vec($str, 2**19-1, 8)=0; $str = "";
Re: Pre-grow a string
by krishnoid (Novice) on Aug 02, 2007 at 23:04 UTC
    Convert::Scalar has such a grow() function, and I suspect 'Packing and Unpacking C Structures' in perlpacktut will give you more info on quickly jumping to the desired offset while appending (I haven't read it in detail myself, but it mentions 'offsets').

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://630323]
Approved by Corion
Front-paged by Corion
and all is quiet...

How do I use this? | Other CB clients
Other Users?
Others imbibing at the Monastery: (5)
As of 2017-01-21 19:47 GMT
Find Nodes?
    Voting Booth?
    Do you watch meteor showers?

    Results (185 votes). Check out past polls.