Beefy Boxes and Bandwidth Generously Provided by pair Networks
Think about Loose Coupling
 
PerlMonks  

[XS] sv_setpv change in behaviour with perl-5.42.0 and later

by syphilis (Archbishop)
on Jan 27, 2026 at 08:42 UTC ( [id://11167243]=perlquestion: print w/replies, xml ) Need Help??

syphilis has asked for the wisdom of the Perl Monks concerning the following question:

Hi,

On Windows 11, perl-5.42.0 and later, a string assignment (in XS) to an SV's PV buffer using sv_setpv can drastically reduce the value of SvLEN.

Demo:
use strict; use warnings; use Devel::Peek; use Inline C =><<'EOC'; void foo(SV * buffer) { char *data = "Hello there"; sv_setpv(buffer, data); } EOC my $buffer = 'z' x 60; Dump $buffer; foo($buffer); Dump $buffer;
Output, running on perl-5.42.0 and perl-5.43.7:
SV = PV(0x2352069fd80) at 0x235206db678 REFCNT = 1 FLAGS = (POK,IsCOW,pPOK) PV = 0x2352275d050 "zzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzz +zzzzzzzzzzzz"\0 CUR = 60 LEN = 64 COW_REFCNT = 1 SV = PV(0x2352069fd80) at 0x235206db678 REFCNT = 1 FLAGS = (POK,pPOK) PV = 0x2352279a8a0 "Hello there"\0 CUR = 11 LEN = 16
What, if anything, should I deduce from the fact that LEN has been reduced from 64 to 16.
Has the size of the PV buffer actually been reduced ?

On perl-5.40.0 and earlier, when running the same script, LEN retains its original value (which is the behaviour that I expected) :
SV = PV(0x2076e35a3b0) at 0x2076e41b338 REFCNT = 1 FLAGS = (POK,pPOK) PV = 0x2077084ae90 "zzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzz +zzzzzzzzzzzz"\0 CUR = 60 LEN = 62 SV = PV(0x2076e35a3b0) at 0x2076e41b338 REFCNT = 1 FLAGS = (POK,pPOK) PV = 0x2077084ae90 "Hello there"\0 CUR = 11 LEN = 62
Cheers,
Rob

Replies are listed 'Best First'.
Re: [XS] sv_setpv change in behaviour with perl-5.42.0 and later
by dave_the_m (Monsignor) on Jan 27, 2026 at 09:06 UTC
    You're seeing Copy-on-Write (COW) affects. Post 5.40, $buffer's string buffer is being shared with the (constant folded at compiletime) constant SV's buffer which is holding the value of 'z' x 60. When $buffer is assigned to, the COW mechanism leaps into action, and it's given its own string buffer (only as big as it needs to be) to store the changed value.

    Dave.

Re: [XS] sv_setpv change in behaviour with perl-5.42.0 and later
by ikegami (Patriarch) on Jan 27, 2026 at 16:23 UTC

    The difference is that 5.42 made constants produced by constant folding eligible for buffer sharing ("COW").[1]

    Use SvGROW if you want the buffer to have a minimum size.


    In the 5.42 run, $buffer initially shares a buffer with the constant created by 'z' x 60. This is evident by the IsCOW flag indicating the buffer is shared with another scalar.

    This means that set_pv must create a new buffer. Notice how the address of the buffer changed from 0x2352275d050 to 0x2352279a8a0.

    There's no reason for set_pv to create a new buffer whose length is based on the old string buffer's length. The new buffer's length will be based on length of the string being assigned.

    SV = PV(0x2352069fd80) at 0x235206db678 REFCNT = 1 FLAGS = (POK,IsCOW,pPOK) <- IsCOW = Shared buffer PV = 0x2352275d050 "zzz...zzz"\0 CUR = 60 LEN = 64 COW_REFCNT = 1 SV = PV(0x2352069fd80) at 0x235206db678 REFCNT = 1 FLAGS = (POK,pPOK) <- No longer sharing a buffer PV = 0x2352279a8a0 "Hello there"\0 <- New buffer at new address CUR = 11 LEN = 16

    In the 5.40 run, $buffer doesn't share a buffer with another scalar, as noted by the lack of the IsCOW flag.

    This means that set_pv can reuse the existing buffer if it's large enough. And it is. Notice how the address of the buffer remains 0x2077084ae90.

    SV = PV(0x2076e35a3b0) at 0x2076e41b338 REFCNT = 1 FLAGS = (POK,pPOK) <- Not sharing a buffer PV = 0x2077084ae90 "zzz...zzz"\0 CUR = 60 LEN = 62 SV = PV(0x2076e35a3b0) at 0x2076e41b338 REFCNT = 1 FLAGS = (POK,pPOK) PV = 0x2077084ae90 "Hello there"\0 <- Same address. Same buffer CUR = 11 LEN = 62

    So why is the buffer shared with 'z' x 60 in one version and not the other?

    5.42 fixed a bug that prevented the buffer of constants created by constant folding from being shared. An in-depth explanation of the bug follows.

    When a string buffer is shared, the IsCOW flag of both scalars is set, and a share count is placed in the unused portion of the buffer.[2] This means that for COW to be used, there must be free space at the end of the string buffer, and the string buffer must be modifiable.

    When Perl encounters a literal, it produces a read-only scalar in memory.[3] Being read-only makes it ineligible for COW. But that would be dumb. So Perl marks the scalar as already being shared with zero scalars.

    $ perl -MDevel::Peek -e'Dump( "zzzzzz" )' SV = PV(0x57f44e969f20) at 0x57f44e9980a8 REFCNT = 1 FLAGS = (POK,IsCOW,READONLY,PROTECT,pPOK) PV = 0x57f44e9e8140 "zzzzzz"\0 CUR = 6 LEN = 16 COW_REFCNT = 0

    One wouldn't normally encounter a scalar shared with zero other scalars. But since Perl doesn't need to check if a scalar's buffer can be shared if it's already shared, this permits the read-only scalar's buffer to be shared.

    For literals, this was true of both 5.42 and earlier versions.

    $ 5.42t/bin/perl -MDevel::Peek -e'Dump( "zzzzzz" )' SV = PV(0x582cd0b72f20) at 0x582cd0ba1098 REFCNT = 1 FLAGS = (POK,IsCOW,READONLY,PROTECT,pPOK) PV = 0x582cd0ba4a40 "zzzzzz"\0 CUR = 6 LEN = 16 COW_REFCNT = 0 $ 5.40t/bin/perl -MDevel::Peek -e'Dump( "zzzzzz" )' SV = PV(0x6381a4328f20) at 0x6381a4357028 REFCNT = 1 FLAGS = (POK,IsCOW,READONLY,PROTECT,pPOK) PV = 0x6381a43a7d10 "zzzzzz"\0 CUR = 6 LEN = 16 COW_REFCNT = 0

    But before 5.42, constants produced by constant folding weren't being set up this way, so they weren't eligible for COW.

    $ 5.42t/bin/perl -MDevel::Peek -e'Dump( "zzz" . "zzz" )' SV = PV(0x6110ea0263a0) at 0x6110ea0540a0 REFCNT = 1 FLAGS = (PADTMP,POK,IsCOW,READONLY,PROTECT,pPOK) PV = 0x6110ea080fc0 "zzzzzz"\0 CUR = 6 LEN = 16 COW_REFCNT = 0 $ 5.40t/bin/perl -MDevel::Peek -e'Dump( "zzz" . "zzz" )' SV = PV(0x5be570a973a0) at 0x5be570ac50c0 REFCNT = 1 FLAGS = (PADTMP,POK,READONLY,PROTECT,pPOK) PV = 0x5be570af23e0 "zzzzzz"\0 CUR = 6 LEN = 16
    $ 5.42t/bin/perl -MDevel::Peek -e'Dump( "z" x 6 )' SV = PV(0x607fa35c4200) at 0x607fa35f2138 REFCNT = 1 FLAGS = (PADTMP,POK,IsCOW,READONLY,PROTECT,pPOK) PV = 0x607fa3603d10 "zzzzzz"\0 CUR = 6 LEN = 16 COW_REFCNT = 0 $ 5.40t/bin/perl -MDevel::Peek -e'Dump( "z" x 6 )' SV = PV(0x5d8bda39d200) at 0x5d8bda3cafb8 REFCNT = 1 FLAGS = (PADTMP,POK,READONLY,PROTECT,pPOK) PV = 0x5d8bda3d5340 "zzzzzz"\0 CUR = 6 LEN = 16

    1. From perl5420delta,

      Constant-folded strings are now shareable via the Copy-on-Write mechanism. [GH #22163]

      The following code would previously have allocated eleven string buffers, each containing one million "A"s:

      my @scalars; push @scalars, ("A" x 1_000_000) for 0..9;

      Now a single buffer is allocated and shared between a CONST OP and the ten scalar elements of @scalars.

      Note that any code using this sort of constant to simulate memory leaks (perhaps in test files) must now permute the string in order to trigger a string copy and the allocation of separate buffers. For example, ("A" x 1_000_000).time might be a suitable small change.

    2. It must be somewhere all users of the buffer can find, and this is a very efficient solution in terms of speed and memory. But it means it can't be used for every scalar.

    3. So you don't do stupid things like

      for ( 1 .. 2 ) { my $r = \"abc"; say $$r; $$r = "def"; }
      There's no reason for set_pv to create a new buffer whose length is based on the old string buffer's length.

      Except that, with earlier perls, the new buffer is the same length as the old buffer - so this is a change in behaviour, and one that I did not expect.
      Let's say there's another XSub to which I want to subsequently pass that buffer, and it's an XSub that requires the buffer to have at least (say) 50 bytes available. For example:
      void bar(unsigned char * buffer) { buffer[49] = 65; }
      On perl 5.40.0 I could re-use that PV that I created in the demo and pass it to bar(), because its SvLEN is still guaranteed to be at least 60.
      But on perl-5.42.0, SvLEN has been reduced to 16, so passing that PV to bar() will result in the buffer being overflowed.
      At least, that's the way it looks to me. (And I'm assuming that such buffer overflow is something to be avoided.)
      Nothing that can't be dealt with, of course - but nonetheless surprising.

      Here's a second script that demonstrates that change in behaviour:
      use strict; use warnings; use Devel::Peek; use Inline C =><<'EOC'; void foo(SV * buffer) { char *data = "Hello there"; sv_setpv(buffer, data); } void bar(unsigned char * buffer) { buffer[49] = 65; } void _set_CUR(SV * buffer, int bytes) { SvCUR_set(buffer, bytes); } EOC my $buffer = 'z' x 60; Dump $buffer; foo($buffer); Dump $buffer; bar($buffer); _set_CUR($buffer, 60); # Ensure that Devel::Peek::Dump will display al +l 60 bytes. Dump $buffer;
      On perl-5.40.0 and earlier, the final Devel::Peek::Dump reveals exactly what I expect:
      SV = PV(0x254ba8dbf08) at 0x254ba920660 REFCNT = 1 FLAGS = (POK,pPOK) PV = 0x254bce10dc8 "Hello there\x00zzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzz +zzzzAzzzzzzzzzz"\0 CUR = 60 LEN = 62
      On perl-5.42.0, the final Dump appears as:
      SV = PV(0x224fe4c0aa0) at 0x224fe4f4cd0 REFCNT = 1 FLAGS = (POK,pPOK) PV = 0x22480c0e170 "Hello there\x00\x00\x00\x00\x00\x00\x00\x00\x00\ +x00\x00\x00\x00\r\x8B7\xBC\x00\xF8\x00\x88\xC0\x82\x9F\x80$\x02\x00\x +00\x00\x00\x00\x00\x00\x00\x00\x00\x10A\x8D\x80$\x02\x00\x00\x00\xB5K +\xFE" CUR = 60 LEN = 16
      (The 'A' at index 49 can be seen if you look closely.)
      However, no-one else has been bothered by this - so I guess I just deal with it appropriately.
      I do have a working solution to my issue that avoids sv_setpv and avoids re-using the same PV. (I might try improving it, but I think it's good enough as it already stands. And it's probably the same as the solution I would have used even if this change of behaviour in 5.42.0 did not exist.)

      Thank you for all of the detail, BTW - much appreciated.
      In fact, thank you to all respondents.

      Cheers,
      Rob

        Except that, with earlier perls, the new buffer is the same length as the old buffer

        That's not true. When older Perls create a new buffer, they are just a bit larger than necessary, just like in 5.42.

        In all the examples where you claim there's a new buffer was allocated based on the size of the old one, you are mistaken. As I explained, no new buffer was allocated in those cases. set_sv is simply modifying the existing buffer, something you can't do with shared buffer. And since Perl never shrinks a buffer, modifying the buffer does not shrink it.[1]


        1. It can free it, e.g. using undef $s; (as opposed to $s = undef;), which could eventually result with a shorter buffer in $s. But I don't know of any circumstances in which it directly shrinks a buffer.

      And updated my explanation.

      Added a lot to parent.

Re: [XS] sv_setpv change in behaviour with perl-5.42.0 and later
by Corion (Patriarch) on Jan 27, 2026 at 09:11 UTC

    In a way, this is plausible, as assigning a fresh string is a good situation to release memory back to the OS.

    I'm not sure if pre-/resizing the buffer for a scalar still survives the various machinations. Obviously it breaks for sv_setpv, but I can see that breaking (fictional) code like the following:

    SV mysv = newSVpv(""); SvGROW(mysv, 1024); // preallocate 1024 bytes as buffer for the re +sponse sv_setpv("Hello World\0"); some_systemcall_with_the_message_buffer( SvPV(mysv), 1024); // wri +tes the response into mysv and expects 1024 bytes to be writable

    ... but I'm not sure if sv_setpv is supposed to copy the string or just copy the pointer to it.

    If the threshold of 64 -> 16 bytes is sensible or if the resizing should only happen for larger decrements is to be seen.

      The sv_setpv*() APIs copy the string.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://11167243]
Front-paged by Arunbear
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others admiring the Monastery: (4)
As of 2026-02-09 02:44 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found

    Notices?
    hippoepoptai's answer Re: how do I set a cookie and redirect was blessed by hippo!
    erzuuliAnonymous Monks are no longer allowed to use Super Search, due to an excessive use of this resource by robots.