Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl: the Markov chain saw

Concatenation with empty string -- good enough to force physical copy?

by vr (Curate)
on Dec 01, 2020 at 21:32 UTC ( #11124486=perlquestion: print w/replies, xml ) Need Help??

vr has asked for the wisdom of the Perl Monks concerning the following question:

Is such side-effect a known, yet more obscure idiom among Perl's many, is it "exploited" by any CPAN distribution, so there's hope of guarantee it won't be optimized away in the future? Quick googling reveals it did happen to another language (SO link) and, apparently, was praised as a good thing.

With current Perl, it performs as expected: no-op, with side-effect I'm after, and I'd be content to leave it in my code, with proper comment, of course. Alternatives that I checked seem to endure "sub call is expensive in Perl" penalty, for small and medium size buffers, or introduce either dependency (Storable), or even further penalty (PDL), when memory for shorter types seems to be allocated as for larger types and then truncated. (Replace "byte" with "double" in code, to see PDL is not intrinsically slow. This bug is only tangentially related, just to show that simple and built-in tools are to be preferred, I think. Unless they are in danger to become extinct.)

Background: function (1) in a 3d party DLL accepts pointers to several buffers (image channels), my Inline::C glue (2) extracts pointers from references and calls (1), my Perl code (3) does PDL arithmetics and then calls (2). For particular input, shortcuts are possible which greatly optimize (3), but then Perl references happen to point to the same data, and (1) misbehaves: looks like it expects physically separate buffers. I need to supply (2) with references to really distinct scalars. I understand it's probably better to be addressed at C side, but I'm not good with C and just lazy, therefore I'm assuming it'd be easier to make a physical string copy in pure Perl.

use strict; use warnings; use PDL; use Storable 'dclone'; use Benchmark 'cmpthese'; for my $size ( 4 .. 8 ) { print "*** string size is 1E$size bytes\n"; my $data = byte( 256 * random( 10 ** $size )); my $buf = $data-> get_dataref; cmpthese( -2, { copy => sub { $data-> copy-> get_dataref }, dclone => sub { dclone $buf }, concat => sub { \( ${ $buf } . '' )}, }); print "\n"; } __END__ Perl executable: C:\berrybrew\strawberry-perl-\perl\ +bin\perl.exe Perl version : 5.32.0 / MSWin32-x64-multi-thread PDL version : 2.021 *** string size is 1E4 bytes Rate copy dclone concat copy 56875/s -- -72% -95% dclone 201992/s 255% -- -83% concat 1182933/s 1980% 486% -- *** string size is 1E5 bytes Rate copy dclone concat copy 7703/s -- -90% -94% dclone 76109/s 888% -- -43% concat 134245/s 1643% 76% -- *** string size is 1E6 bytes Rate copy dclone concat copy 808/s -- -82% -83% dclone 4538/s 462% -- -4% concat 4713/s 484% 4% -- *** string size is 1E7 bytes Rate copy concat dclone copy 67.8/s -- -70% -71% concat 228/s 236% -- -1% dclone 230/s 240% 1% -- *** string size is 1E8 bytes Rate copy dclone concat copy 6.84/s -- -69% -69% dclone 21.8/s 219% -- -1% concat 22.1/s 223% 1% --
  • Comment on Concatenation with empty string -- good enough to force physical copy?
  • Download Code

Replies are listed 'Best First'.
Re: Concatenation with empty string -- good enough to force physical copy?
by dave_the_m (Monsignor) on Dec 02, 2020 at 11:11 UTC
    Basically you're trying to defeat perl's copy-on-write optimisation. Concatting an empty string will currently do this, but there are no future guarantees. If you want your code to be robust you really need to handle this at the C level.


      Exactly, word "CoW" was on my mind but somehow omitted when re-phrasing the question. Thank you for sharing your insight. As there's no "idiom" apparently (but wouldn't it be nice... concatenation with compile time constant treated as such), the trick with side-effect should better be avoided.

Re: Concatenation with empty string -- good enough to force physical copy?
by ikegami (Patriarch) on Dec 03, 2020 at 07:05 UTC

    This is forward-compatible:

    use Inline C => <<'__EOS__'; void force_ordinary_string(SV *sv) { SvPV_force(sv); } __EOS__ force_ordinary_string($$buf);

      I just noticed the footnote in the OP. Sounds like you have C/XS code that modifies the string buffer which uses SvPV, SvPVbyte or SvPVutf8 instead of SvPV_force, SvPVbyte_force or SvPVutf8_force. Bad!

        Can you please explain, why is it bad? Pointer to string buffer goes "as is" to 3d party function, and even with explicit flag (there is one) not to make a copy in data structures created somewhere in 3d party code: "if FALSE, wrap the user's pixel buffer, otherwise, make a deep copy". I was actually proud because of this "no copying, low memory, fast code" thing. All works as expected, no leaks with long running code, no failures, etc. After images are saved, piddles themselves are not needed (nor checked what happened to them). The only issue was when identical piddles happened in original list, then every other channel was mirrored, but I don't know if it means buffer itself gets modified or image coder gets confused.

        And thank you for C recipe for forcing a copy.

Re: Concatenation with empty string -- good enough to force physical copy?
by etj (Chaplain) on Apr 19, 2022 at 22:19 UTC

Log In?

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://11124486]
Approved by LanX
Front-paged by thomas895
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others making s'mores by the fire in the courtyard of the Monastery: (1)
As of 2022-11-28 01:17 GMT
Find Nodes?
    Voting Booth?