|
Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:
$s = '0123456789';
printf "%x\n", unpack 'Q', pack 'P', $s
while $s =~ /\G./g;
# 55fd180fcc60
# 55fd18107f30
# 55fd18107ed0
# 55fd180e6700
# 55fd180fcc60
# 55fd18107f30
# 55fd18107ed0
# 55fd180e6700
# 55fd180fcc60
# 55fd18107f30
Looks like patterns don't matter, both between slashes (the above is for illusion of being useful, supposedly performed daily by thousands) and alternation of addresses, it just picks what's available.
use strict;
use warnings;
use feature 'say';
use Config;
say "$^V / $Config{ archname }";
for my $len ( 1e4, 1e5, 1e6, 1e7 ) {
my $s = 'a' x $len;
my %h;
my @before = times;
$s =~ /\G./gs and ++ $h{ pack 'P', $s }
for 1 .. $len / 100; # do just 1% !!!
my @after = times;
printf "Length: %8d, addresses: %6d, user: %6g, system: %6g\n",
$len, scalar( keys %h ),
$after[ 0 ] - $before[ 0 ],
$after[ 1 ] - $before[ 1 ]
}
__END__
v5.40.3 / x86_64-linux-thread-multi
Length: 10000, addresses: 1, user: 0, system: 0
Length: 100000, addresses: 1, user: 0, system: 0
Length: 1000000, addresses: 1, user: 0.01, system: 0
Length: 10000000, addresses: 1, user: 0.02, system: 0
v5.42.0 / x86_64-linux-thread-multi
Length: 10000, addresses: 2, user: 0, system: 0
Length: 100000, addresses: 3, user: 0.01, system: 0
Length: 1000000, addresses: 3, user: 0.39, system: 0
Length: 10000000, addresses: 3, user: 139.87, system: 0
v5.43.8 / x86_64-linux
Length: 10000, addresses: 2, user: 0, system: 0
Length: 100000, addresses: 3, user: 0.01, system: 0
Length: 1000000, addresses: 3, user: 0.38, system: 0
Length: 10000000, addresses: 3, user: 138.75, system: 0.01
v5.42.0 / MSWin32-x64-multi-thread
Length: 10000, addresses: 2, user: 0, system: 0
Length: 100000, addresses: 2, user: 0, system: 0
Length: 1000000, addresses: 2, user: 0.844, system: 1.063
Length: 10000000, addresses: 70, user: 154.453, system: 253.812
It's much worse on Windows, with absurdly grotesque "system" time (I understand it's approximation for that OS); I can only assume buffers, when long enough (?), are continuously requested to be de-allocated and allocated, from OS, again and again, which is not the case with Perl on Linux.
If there's a bug it's a shock it goes unnoticed, I'm just a no-one. In fact I'm grateful that yesterday it wasn't possible to post, I now see the above is separate from another similar issue from 5.20 and on; better not to mix them together
Re: 5.42: Does m// toss a string around?
by ikegami (Patriarch) on Jan 21, 2026 at 20:54 UTC
|
It's pack 'P' that's making a copy of the string buffer.
While matching creates a copy of the string matched, matching uses the COW mechanism to avoid making a copy of the string buffer when possible.[1] The COW mechanism *is* being used here, so no copy is being made by matching.
However, pack 'P' forces the scalar to have a string buffer and for it to be modifiable.[2] Since you are starting with a buffer that's shared with the literal constant and the scalar associated with $& and other regex vars, this means pack 'P' forces a copy to be made.
use Config qw( %Config );
use Devel::Peek qw( Dump );
my $ptr_size = $Config{ ptrsize };
my $ptr_format =
$ptr_size == 4 ? "L" :
$ptr_size == 8 ? "Q" :
die( "Unsupported pointer size $ptr_size\n" );
# https://perldoc.perl.org/perlapi#SvPV_force
sub SvPV_force { unpack $ptr_format, pack "P", $_[0] }
my $s = "abc";
Dump $s;
printf "%x\n", SvPV_force( $s );
Dump $s;
SV = PV(0x57565c72fee0) at 0x57565c75e0d0
REFCNT = 1
FLAGS = (POK,IsCOW,pPOK)
PV = 0x57565c761a90 "abc"\0
CUR = 3
LEN = 16
COW_REFCNT = 1
00.00.57.56.5c.78.5b.c0
SV = PV(0x57565c72fee0) at 0x57565c75e0d0
REFCNT = 1
FLAGS = (POK,pPOK)
PV = 0x57565c785bc0 "abc"\0
CUR = 3
LEN = 16
Notice how Dump showed a shared buffer (IsCOW) before pack 'P', but one that isn't shared afterwards.
You can use the following to get the address of the string buffer without forcing it to exist and to be modifiable:
use B qw( svref_2object );
# https://perldoc.perl.org/perlapi#SvPVX
sub SvPVX { svref_2object( \$_[0] )->PV }
my $s = "0123456789";
printf "%x\n", SvPVX( $s ) while $s =~ /\G./g;
-
Matching will make a copy of the string buffer if it can't use COW. Example.
-
From perl5420delta,
pack("p", ...) and pack("P", ...) now SvPV_force() the supplied SV unless it is read only. This will remove CoW from the SV and prevents code that writes through the generated pointer from modifying the value of other SVs that happen the share the same CoWed string buffer.
Note: this does not make pack("p",... ) safe, if the SV is magical then any writes to the buffer will likely be discarded on the next read. [GH #22380]
Update: Renamed SvPV to SvPVX for accuracy.
| [reply] [d/l] [select] |
|
|
Ah, thanks. If I understand correctly, it was 5.42 which fixed the long-standing issue with "pack 'P',".
(I think the PV method constructs new string rather than returns numeric value. I'll use "pack 'P'," for the next example, for brevity of output over Devel::Peek::Dump; there's no 5.42 below)
Wild goose chase, then, being sidetracked with imagined issue with 5.42, but initially I investigated s///, not m//. I now suspect similar reasons, related to COW, can you explain please? The 5.18 re-uses string buffer when new length is either the same or less than original. The 5.20 and then 5.26 loose both abilities.
print "$^V\n";
my $x = '0123456789';
{
my $s = "$x$x";
$s =~ s/(.)$/a/;
printf "%x\n", unpack "Q", pack "P", $s;
$s =~ s/(.)$/b/;
printf "%x\n", unpack "Q", pack "P", $s;
}
{
my $s = "$x$x";
$s =~ s/.$//;
printf "%x\n", unpack "Q", pack "P", $s;
$s =~ s/.$//;
printf "%x\n", unpack "Q", pack "P", $s;
}
__END__
v5.18.4
29cdda8
29cdda8
29ce1e8
29ce1e8
v5.20.3
2b774f8
2b774f8
2b76b78
2b76f38
v5.26.3
2c9e118
2c9d7b8
2c9da88
2b4ae88
| [reply] [d/l] [select] |
|
|
I think the PV method constructs new string rather than returns numeric value
No.
SvPVX creates an object that provide information about a scalar ($_[0] aka $s), then uses that object's PV method to obtain the address of the string buffer of $s.
I now suspect similar reasons, related to COW, can you explain please?
COW was introduced in 5.20. Before 5.20, hacks which malfunctioned and/or a more expensive alternative had to be used.
| [reply] [d/l] [select] |
|
|
|
|
|
Re: 5.42: Does m// toss a string around?
by Anonymous Monk on Jan 21, 2026 at 22:41 UTC
|
v5.38.2 / darwin-2level
Length: 10000, addresses: 1, user: 0, system: 0
Length: 100000, addresses: 1, user: 0, system: 0
Length: 1000000, addresses: 1, user: 0, system: 0
Length: 10000000, addresses: 1, user: 0.02, system: 0
Run 4 times under 5.42.0:
v5.42.0 / darwin-2level
Length: 10000, addresses: 2, user: 0, system: 0
Length: 100000, addresses: 4, user: 0, system: 0
Length: 1000000, addresses: 11, user: 0.2, system: 0
Length: 10000000, addresses: 2, user: 23.86, system: 0.03
v5.42.0 / darwin-2level
Length: 10000, addresses: 2, user: 0, system: 0
Length: 100000, addresses: 2, user: 0.01, system: 0
Length: 1000000, addresses: 7, user: 0.18, system: 0
Length: 10000000, addresses: 2, user: 23.15, system: 0.03
v5.42.0 / darwin-2level
Length: 10000, addresses: 2, user: 0, system: 0
Length: 100000, addresses: 2, user: 0.01, system: 0
Length: 1000000, addresses: 8, user: 0.21, system: 0
Length: 10000000, addresses: 2, user: 23.23, system: 0.02
v5.42.0 / darwin-2level
Length: 10000, addresses: 2, user: 0, system: 0
Length: 100000, addresses: 2, user: 0.01, system: 0
Length: 1000000, addresses: 9, user: 0.21, system: 0
Length: 10000000, addresses: 2, user: 23.26, system: 0.03
| [reply] [d/l] [select] |
|
|