Beefy Boxes and Bandwidth Generously Provided by pair Networks
Syntactic Confectionery Delight
 
PerlMonks  

Re: Are beheaded strings known to be slow?

by tonyc (Friar)
on Oct 10, 2025 at 00:24 UTC ( [id://11166454]=note: print w/replies, xml ) Need Help??


in reply to Are beheaded strings known to be slow?

I suspect it's because OOK SVs aren't Copy-On-Write-able.

I did a quick profile, and with test2() the code spends most of it's time in memcpy() called from reg_set_capture_string().

  • Comment on Re: Are beheaded strings known to be slow?

Replies are listed 'Best First'.
Re^2: Are beheaded strings known to be slow?
by ikegami (Patriarch) on Oct 10, 2025 at 15:22 UTC

    That's what it is. Regex matching makes a copy of the scalar for use by $&, $1, etc. It uses the COW mechanism if possible. It doesn't for OOK scalars.

    After one pass of test1's loop:

    SV = PVMG(0x5bb22b8c3b80) at 0x5bb22b9beae8 REFCNT = 1 FLAGS = (SMG,POK,IsCOW,pPOK) <-- IsCOW: String buffer is shared IV = 0 NV = 0 PV = 0x5bb22b9ea230 "..." CUR = 99999 LEN = 100001 COW_REFCNT = 1 MAGIC = 0x5bb22b9c48f0 MG_VIRTUAL = &PL_vtbl_mglob MG_TYPE = PERL_MAGIC_regex_global(g) MG_FLAGS = 0x40 BYTES MG_LEN = -1

    After one pass of test2's loop:

    SV = PVMG(0x5bb22b8c3bb0) at 0x5bb22b9cc1f0 REFCNT = 1 FLAGS = (SMG,POK,OOK,pPOK) <-- !IsCOW: String buffer isn't shared IV = 0 NV = 0 OFFSET = 1 PV = 0x5bb22ba028e1 ( "\x01" . ) "..." CUR = 99999 LEN = 100001 MAGIC = 0x5bb22b9c48f0 MG_VIRTUAL = &PL_vtbl_mglob MG_TYPE = PERL_MAGIC_regex_global(g) MG_FLAGS = 0x40 BYTES MG_LEN = -1

    That means a copy of the string buffer must have been made. By the end of the loop, a copy has been done 99,999 times (once per successful regex match).

    You can run into the same issue with foreign string buffers (e.g. a memory-mapped file). See why is Perl File::Map so slow compared to File::Slurp?

Re^2: Are beheaded strings known to be slow?
by Anonymous Monk on Oct 10, 2025 at 17:58 UTC

    Thanks for clarification. I.e. we better be careful with 4-args (or LHS) variant of substr?

    It's funny, Perl rightfully decides (case "0") it's cheaper to move 499_998 initial bytes _up_ than moving the trailing 500_000 bytes _down_ (and so scalar gets OOK-"contaminated"); and it's the opposite (rightfully) for "2" -- scalar stays OOK-free & usable OK with regex engine. When ("1") they are both 499_999, then... -- ??? move the 1st half UP and actually _complicate_ things! (do additional work of setting the OFFSET?)

    use Devel::Peek; $Devel::Peek::pv_limit = 1; for ( 0 .. 2 ) { my $s = 'a' x 1e6; substr $s, 499_998 + $_, 2, 'b'; Dump $s; } __END__ SV = PV(0xdbb198) at 0x26f8ae0 REFCNT = 1 FLAGS = (POK,OOK,pPOK) OFFSET = 1 PV = 0x2954309 ( ""... . ) "a"...\0 CUR = 999999 LEN = 1000001 SV = PV(0xdbb198) at 0x26f8ae0 REFCNT = 1 FLAGS = (POK,OOK,pPOK) OFFSET = 1 PV = 0x2954309 ( ""... . ) "a"...\0 CUR = 999999 LEN = 1000001 SV = PV(0xdbb198) at 0x26f8ae0 REFCNT = 1 FLAGS = (POK,pPOK) PV = 0x2954308 "a"...\0 CUR = 999999 LEN = 1000002
Re^2: Are beheaded strings known to be slow?
by NERDVANA (Priest) on Oct 11, 2025 at 14:24 UTC
    That's unfortunate... I have code that is using regexes on File::Map buffers under the assumption it was an efficient way to scan through a file. Maybe there's room for one more optimization that checks the length of what needs copied vs. the length of the source scalar? In the example, each iteration only needs to put a single character in $&.

      You're forgetting about $` and $'. Between $`, $' and $&, the entire string is covered.

        Also unfortunate...

        Maybe it would be neat if there was a new feature that turned those off for any match in the scope where it was disabled... to maybe be added to some future "use v5.46".

        Or, maybe Regexp refs should have methods, so the regex engine can be used without affecting *any* global variables.

        my $match= qr/(\d+)/->match($subject); say $match->captures->[0] if $match;

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://11166454]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others about the Monastery: (2)
As of 2025-11-14 02:35 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?
    What's your view on AI coding assistants?





    Results (70 votes). Check out past polls.

    Notices?
    hippoepoptai's answer Re: how do I set a cookie and redirect was blessed by hippo!
    erzuuliAnonymous Monks are no longer allowed to use Super Search, due to an excessive use of this resource by robots.