Beefy Boxes and Bandwidth Generously Provided by pair Networks
Welcome to the Monastery
 
PerlMonks  

5.42: Does m// toss a string around?

by Anonymous Monk
on Jan 21, 2026 at 17:46 UTC ( [id://11167202]=perlquestion: print w/replies, xml ) Need Help??

Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

$s = '0123456789'; printf "%x\n", unpack 'Q', pack 'P', $s while $s =~ /\G./g; # 55fd180fcc60 # 55fd18107f30 # 55fd18107ed0 # 55fd180e6700 # 55fd180fcc60 # 55fd18107f30 # 55fd18107ed0 # 55fd180e6700 # 55fd180fcc60 # 55fd18107f30

Looks like patterns don't matter, both between slashes (the above is for illusion of being useful, supposedly performed daily by thousands) and alternation of addresses, it just picks what's available.

use strict; use warnings; use feature 'say'; use Config; say "$^V / $Config{ archname }"; for my $len ( 1e4, 1e5, 1e6, 1e7 ) { my $s = 'a' x $len; my %h; my @before = times; $s =~ /\G./gs and ++ $h{ pack 'P', $s } for 1 .. $len / 100; # do just 1% !!! my @after = times; printf "Length: %8d, addresses: %6d, user: %6g, system: %6g\n", $len, scalar( keys %h ), $after[ 0 ] - $before[ 0 ], $after[ 1 ] - $before[ 1 ] } __END__ v5.40.3 / x86_64-linux-thread-multi Length: 10000, addresses: 1, user: 0, system: 0 Length: 100000, addresses: 1, user: 0, system: 0 Length: 1000000, addresses: 1, user: 0.01, system: 0 Length: 10000000, addresses: 1, user: 0.02, system: 0 v5.42.0 / x86_64-linux-thread-multi Length: 10000, addresses: 2, user: 0, system: 0 Length: 100000, addresses: 3, user: 0.01, system: 0 Length: 1000000, addresses: 3, user: 0.39, system: 0 Length: 10000000, addresses: 3, user: 139.87, system: 0 v5.43.8 / x86_64-linux Length: 10000, addresses: 2, user: 0, system: 0 Length: 100000, addresses: 3, user: 0.01, system: 0 Length: 1000000, addresses: 3, user: 0.38, system: 0 Length: 10000000, addresses: 3, user: 138.75, system: 0.01 v5.42.0 / MSWin32-x64-multi-thread Length: 10000, addresses: 2, user: 0, system: 0 Length: 100000, addresses: 2, user: 0, system: 0 Length: 1000000, addresses: 2, user: 0.844, system: 1.063 Length: 10000000, addresses: 70, user: 154.453, system: 253.812

It's much worse on Windows, with absurdly grotesque "system" time (I understand it's approximation for that OS); I can only assume buffers, when long enough (?), are continuously requested to be de-allocated and allocated, from OS, again and again, which is not the case with Perl on Linux.

If there's a bug it's a shock it goes unnoticed, I'm just a no-one. In fact I'm grateful that yesterday it wasn't possible to post, I now see the above is separate from another similar issue from 5.20 and on; better not to mix them together

Replies are listed 'Best First'.
Re: 5.42: Does m// toss a string around?
by ikegami (Patriarch) on Jan 21, 2026 at 20:54 UTC

    It's pack 'P' that's making a copy of the string buffer.


    While matching creates a copy of the string matched, matching uses the COW mechanism to avoid making a copy of the string buffer when possible.[1] The COW mechanism *is* being used here, so no copy is being made by matching.

    However, pack 'P' forces the scalar to have a string buffer and for it to be modifiable.[2] Since you are starting with a buffer that's shared with the literal constant and the scalar associated with $& and other regex vars, this means pack 'P' forces a copy to be made.

    use Config qw( %Config ); use Devel::Peek qw( Dump ); my $ptr_size = $Config{ ptrsize }; my $ptr_format = $ptr_size == 4 ? "L" : $ptr_size == 8 ? "Q" : die( "Unsupported pointer size $ptr_size\n" ); # https://perldoc.perl.org/perlapi#SvPV_force sub SvPV_force { unpack $ptr_format, pack "P", $_[0] } my $s = "abc"; Dump $s; printf "%x\n", SvPV_force( $s ); Dump $s;
    SV = PV(0x57565c72fee0) at 0x57565c75e0d0 REFCNT = 1 FLAGS = (POK,IsCOW,pPOK) PV = 0x57565c761a90 "abc"\0 CUR = 3 LEN = 16 COW_REFCNT = 1 00.00.57.56.5c.78.5b.c0 SV = PV(0x57565c72fee0) at 0x57565c75e0d0 REFCNT = 1 FLAGS = (POK,pPOK) PV = 0x57565c785bc0 "abc"\0 CUR = 3 LEN = 16

    Notice how Dump showed a shared buffer (IsCOW) before pack 'P', but one that isn't shared afterwards.


    You can use the following to get the address of the string buffer without forcing it to exist and to be modifiable:

    use B qw( svref_2object ); # https://perldoc.perl.org/perlapi#SvPVX sub SvPVX { svref_2object( \$_[0] )->PV } my $s = "0123456789"; printf "%x\n", SvPVX( $s ) while $s =~ /\G./g;

    1. Matching will make a copy of the string buffer if it can't use COW. Example.

    2. From perl5420delta,

      pack("p", ...) and pack("P", ...) now SvPV_force() the supplied SV unless it is read only. This will remove CoW from the SV and prevents code that writes through the generated pointer from modifying the value of other SVs that happen the share the same CoWed string buffer.

      Note: this does not make pack("p",... ) safe, if the SV is magical then any writes to the buffer will likely be discarded on the next read. [GH #22380]


    Update: Renamed SvPV to SvPVX for accuracy.

      Ah, thanks. If I understand correctly, it was 5.42 which fixed the long-standing issue with "pack 'P',".

      (I think the PV method constructs new string rather than returns numeric value. I'll use "pack 'P'," for the next example, for brevity of output over Devel::Peek::Dump; there's no 5.42 below)

      Wild goose chase, then, being sidetracked with imagined issue with 5.42, but initially I investigated s///, not m//. I now suspect similar reasons, related to COW, can you explain please? The 5.18 re-uses string buffer when new length is either the same or less than original. The 5.20 and then 5.26 loose both abilities.

      print "$^V\n"; my $x = '0123456789'; { my $s = "$x$x"; $s =~ s/(.)$/a/; printf "%x\n", unpack "Q", pack "P", $s; $s =~ s/(.)$/b/; printf "%x\n", unpack "Q", pack "P", $s; } { my $s = "$x$x"; $s =~ s/.$//; printf "%x\n", unpack "Q", pack "P", $s; $s =~ s/.$//; printf "%x\n", unpack "Q", pack "P", $s; } __END__ v5.18.4 29cdda8 29cdda8 29ce1e8 29ce1e8 v5.20.3 2b774f8 2b774f8 2b76b78 2b76f38 v5.26.3 2c9e118 2c9d7b8 2c9da88 2b4ae88

        I think the PV method constructs new string rather than returns numeric value

        No.

        SvPVX creates an object that provide information about a scalar ($_[0] aka $s), then uses that object's PV method to obtain the address of the string buffer of $s.

        I now suspect similar reasons, related to COW, can you explain please?

        COW was introduced in 5.20. Before 5.20, hacks which malfunctioned and/or a more expensive alternative had to be used.

Re: 5.42: Does m// toss a string around?
by Anonymous Monk on Jan 21, 2026 at 22:41 UTC
    M3 Macbook Air
    v5.38.2 / darwin-2level Length: 10000, addresses: 1, user: 0, system: 0 Length: 100000, addresses: 1, user: 0, system: 0 Length: 1000000, addresses: 1, user: 0, system: 0 Length: 10000000, addresses: 1, user: 0.02, system: 0
    Run 4 times under 5.42.0:
    v5.42.0 / darwin-2level Length: 10000, addresses: 2, user: 0, system: 0 Length: 100000, addresses: 4, user: 0, system: 0 Length: 1000000, addresses: 11, user: 0.2, system: 0 Length: 10000000, addresses: 2, user: 23.86, system: 0.03 v5.42.0 / darwin-2level Length: 10000, addresses: 2, user: 0, system: 0 Length: 100000, addresses: 2, user: 0.01, system: 0 Length: 1000000, addresses: 7, user: 0.18, system: 0 Length: 10000000, addresses: 2, user: 23.15, system: 0.03 v5.42.0 / darwin-2level Length: 10000, addresses: 2, user: 0, system: 0 Length: 100000, addresses: 2, user: 0.01, system: 0 Length: 1000000, addresses: 8, user: 0.21, system: 0 Length: 10000000, addresses: 2, user: 23.23, system: 0.02 v5.42.0 / darwin-2level Length: 10000, addresses: 2, user: 0, system: 0 Length: 100000, addresses: 2, user: 0.01, system: 0 Length: 1000000, addresses: 9, user: 0.21, system: 0 Length: 10000000, addresses: 2, user: 23.26, system: 0.03

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://11167202]
Approved by hippo
Front-paged by Corion
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others admiring the Monastery: (4)
As of 2026-02-09 02:44 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found

    Notices?
    hippoepoptai's answer Re: how do I set a cookie and redirect was blessed by hippo!
    erzuuliAnonymous Monks are no longer allowed to use Super Search, due to an excessive use of this resource by robots.