Beefy Boxes and Bandwidth Generously Provided by pair Networks
Do you know where your variables are?
 
PerlMonks  

Are beheaded strings known to be slow?

by Anonymous Monk
on Oct 09, 2025 at 17:57 UTC ( [id://11166450]=perlquestion: print w/replies, xml ) Need Help??

Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

I know Perl maintains a pointer (offset), within an SV, of where actual data in PV buffer starts, -- is this "beheading" the reason for the slowdown as demonstrated?

use strict; use warnings; use Benchmark 'cmpthese'; my $str = ' ' x 1e5; sub test1 { my $s = shift; my $copy = substr $str, 1; 1 while $copy =~ /./g; _: } sub test2 { my $s = shift; substr $s, 0, 1, ''; 1 while $s =~ /./g; _: } cmpthese -1, { test1 => sub { test1( $str )}, test2 => sub { test2( $str )}, }; # (warning: too few iterations for a reliable count) # Rate test2 test1 # test2 2.79/s -- -93% # test1 41.0/s 1372% --

Replies are listed 'Best First'.
Re: Are beheaded strings known to be slow?
by tonyc (Friar) on Oct 10, 2025 at 00:24 UTC

    I suspect it's because OOK SVs aren't Copy-On-Write-able.

    I did a quick profile, and with test2() the code spends most of it's time in memcpy() called from reg_set_capture_string().

      That's what it is. Regex matching makes a copy of the scalar for use by $&, $1, etc. It uses the COW mechanism if possible. It doesn't for OOK scalars.

      After one pass of test1's loop:

      SV = PVMG(0x5bb22b8c3b80) at 0x5bb22b9beae8 REFCNT = 1 FLAGS = (SMG,POK,IsCOW,pPOK) <-- IsCOW: String buffer is shared IV = 0 NV = 0 PV = 0x5bb22b9ea230 "..." CUR = 99999 LEN = 100001 COW_REFCNT = 1 MAGIC = 0x5bb22b9c48f0 MG_VIRTUAL = &PL_vtbl_mglob MG_TYPE = PERL_MAGIC_regex_global(g) MG_FLAGS = 0x40 BYTES MG_LEN = -1

      After one pass of test2's loop:

      SV = PVMG(0x5bb22b8c3bb0) at 0x5bb22b9cc1f0 REFCNT = 1 FLAGS = (SMG,POK,OOK,pPOK) <-- !IsCOW: String buffer isn't shared IV = 0 NV = 0 OFFSET = 1 PV = 0x5bb22ba028e1 ( "\x01" . ) "..." CUR = 99999 LEN = 100001 MAGIC = 0x5bb22b9c48f0 MG_VIRTUAL = &PL_vtbl_mglob MG_TYPE = PERL_MAGIC_regex_global(g) MG_FLAGS = 0x40 BYTES MG_LEN = -1

      That means a copy of the string buffer must have been made. By the end of the loop, a copy has been done 99,999 times (once per successful regex match).

      You can run into the same issue with foreign string buffers (e.g. a memory-mapped file). See why is Perl File::Map so slow compared to File::Slurp?

      Thanks for clarification. I.e. we better be careful with 4-args (or LHS) variant of substr?

      It's funny, Perl rightfully decides (case "0") it's cheaper to move 499_998 initial bytes _up_ than moving the trailing 500_000 bytes _down_ (and so scalar gets OOK-"contaminated"); and it's the opposite (rightfully) for "2" -- scalar stays OOK-free & usable OK with regex engine. When ("1") they are both 499_999, then... -- ??? move the 1st half UP and actually _complicate_ things! (do additional work of setting the OFFSET?)

      use Devel::Peek; $Devel::Peek::pv_limit = 1; for ( 0 .. 2 ) { my $s = 'a' x 1e6; substr $s, 499_998 + $_, 2, 'b'; Dump $s; } __END__ SV = PV(0xdbb198) at 0x26f8ae0 REFCNT = 1 FLAGS = (POK,OOK,pPOK) OFFSET = 1 PV = 0x2954309 ( ""... . ) "a"...\0 CUR = 999999 LEN = 1000001 SV = PV(0xdbb198) at 0x26f8ae0 REFCNT = 1 FLAGS = (POK,OOK,pPOK) OFFSET = 1 PV = 0x2954309 ( ""... . ) "a"...\0 CUR = 999999 LEN = 1000001 SV = PV(0xdbb198) at 0x26f8ae0 REFCNT = 1 FLAGS = (POK,pPOK) PV = 0x2954308 "a"...\0 CUR = 999999 LEN = 1000002
      That's unfortunate... I have code that is using regexes on File::Map buffers under the assumption it was an efficient way to scan through a file. Maybe there's room for one more optimization that checks the length of what needs copied vs. the length of the source scalar? In the example, each iteration only needs to put a single character in $&.

        You're forgetting about $` and $'. Between $`, $' and $&, the entire string is covered.

Re: Are beheaded strings known to be slow?
by LanX (Saint) on Oct 09, 2025 at 18:05 UTC
    first observation test1 operates directly on $str not $s

    then you should measure how fast those substrs operate without the regex and if they really always deal with strings of the same length.

    Cheers Rolf
    (addicted to the Perl Programming Language :)
    see Wikisyntax for the Monastery

      first observation test1 operates directly on $str not $s

      ouch, thanks, I blundered when making SSCCE, s/str/s/ doesn't make any difference

Re: Are beheaded strings known to be slow?
by ikegami (Patriarch) on Oct 09, 2025 at 18:16 UTC

    Weird.

    There shouldn't be any performance penalty to reading an OOK ("beheaded") scalar. Rather than adding an offset each time the string it accessed, the pointer to the string is changed to point beyond the "head" when the string is "beheaded".

    use Devel::Peek qw( Dump ); my $s = "abcdef"; $s .= "g"; # Trigger COW Dump( $s ); substr( $s, 0, 1, "" ); Dump( $s );
    SV = PV(0x59d752a54ee0) at 0x59d752a830d0 REFCNT = 1 FLAGS = (POK,pPOK) PV = 0x59d752a6b2e0 "abcdefg"\0 CUR = 7 LEN = 16 SV = PV(0x59d752a54ee0) at 0x59d752a830d0 REFCNT = 1 FLAGS = (POK,OOK,pPOK) OFFSET = 1 PV = 0x59d752a6b2e1 ( "\x01" . ) "bcdefg"\0 CUR = 6 LEN = 15

    Before the beheading, the string started at 0x59d752a6b2e0 and used 7 bytes of 16.

    After the beheading, the string started at 0x59d752a6b2e1 and used 6 bytes of 15.

    In both cases, the same memory block starting at 0x59d752a6b2e0 is used. But because the PV, CUR and LEN fields were adjusted, reading from the scalar is identical whether it's an OOK ("beheaded") scalar or not.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://11166450]
Approved by marto
Front-paged by marto
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others about the Monastery: (3)
As of 2025-11-08 01:07 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?
    What's your view on AI coding assistants?





    Results (65 votes). Check out past polls.

    Notices?
    hippoepoptai's answer Re: how do I set a cookie and redirect was blessed by hippo!
    erzuuliAnonymous Monks are no longer allowed to use Super Search, due to an excessive use of this resource by robots.