Beefy Boxes and Bandwidth Generously Provided by pair Networks
Problems? Is your data what you think it is?

Re: Memory use/leak with large number of (?{}) patterns in regex

by dave_the_m (Monsignor)
on Nov 24, 2019 at 15:10 UTC ( #11109148=note: print w/replies, xml ) Need Help??

in reply to Memory use/leak with large number of (?{}) patterns in regex

Well, the lack of freeing is unremarkable. Running the regex is likely to malloc() and and finally free() lots of small chunks of memory. These will be reused if you run a similar regex again, but trying to then malloc() a single 1Gb string is unlikely to be able to make use of all those little blocks recently freed.

However, what *is* worrying is that memory usage goes quadratic on the number of code blocks in the pattern. I'll try to have a look at it sometime when I have the time.


  • Comment on Re: Memory use/leak with large number of (?{}) patterns in regex

Replies are listed 'Best First'.
Re^2: Memory use/leak with large number of (?{}) patterns in regex
by dave_the_m (Monsignor) on Nov 24, 2019 at 20:00 UTC
    It's the combination of captures and code blocks. Each time the regex engine is about to execute a code block, it saves the indices of all the captures done so far, so they can be restored at the end. It does this on the pessimistic assumption that code within the block can do anything, including recursively executing the same regex again, overwriting the existing capture indices.

    This is why quadratic memory behaviour is being seen.

    Not ideal, but can avoided if you use non-capturing braces.


      Could saving the capture indices be lazily done, with some kind of "regex in use" flag set on the regex, such that recursively executing the same regex causes the capture indices to be preserved, but only if really needed?

      This would slightly add to the general regex overhead, from needing to check the "regex in use" flag on every pattern match, but perhaps that could be folded into the existing logic that handles compiling patterns when needed?

        There's a lot that *could* be done. It's a complete mess at the moment and needs an overhaul - it affects lots of things, not just this issue, e.g. unnecessary slowness using a regex object for a match compared with a literal pattern. Really, the capture state needs splitting off into a separate data structure from the main regex data structure, so that it can be swapped in and out easily, and so a qr// object can be used in multiple places without internally having to clone the whole thing each time.

        It's on my very long list of things to be do, but I'm not likely to do it any time soon.


Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://11109148]
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others surveying the Monastery: (6)
As of 2020-10-19 16:17 GMT
Find Nodes?
    Voting Booth?
    My favourite web site is:

    Results (205 votes). Check out past polls.