Beefy Boxes and Bandwidth Generously Provided by pair Networks
more useful options
 
PerlMonks  

Re^4: What perl operations will consume C stack space?

by BrowserUk (Pope)
on Feb 27, 2006 at 14:08 UTC ( #533022=note: print w/ replies, xml ) Need Help??


in reply to Re^3: What perl operations will consume C stack space?
in thread What perl operations will consume C stack space?

I'm probably missing something, but I don't see the circumstances under which /(ab*){32766}/ couldn't be replaced by /(ab+|a){32766}/ with the same outcome (except the core dump), but much less backtracking?

Likewise, is there anything that /"((?:\\.|[^"])+)"/ would match that /"([^"]+)"/ wouldn't?


Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
Lingua non convalesco, consenesco et abolesco. -- Rule 1 has a caveat! -- Who broke the cabal?
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.


Comment on Re^4: What perl operations will consume C stack space?
Select or Download Code
Re^5: What perl operations will consume C stack space?
by Corion (Pope) on Feb 27, 2006 at 14:18 UTC

    Your first factorisation(sp?) is valid - (ab*){32766} and (ab+|a){32766} are equivalent, as they will both match and not match exactly the same strings (modulo any deficiencies in the Perl interpreter). Your second factorization is not equivalent, as the first regex allows for backslashed items, which the second doesn't:

    #!/usr/bin/perl -w use strict; my @regexen = ( qr/"((?:\\.|[^"])+)"/, qr/"([^"]+)"/, ); for (<DATA>) { for my $r (@regexen) { print "$_\t"; print "\t$r"; if (/$r/) { print "\tMatch\t($1)\n"; } else { print "\tNo match.\n"; }; }; }; __DATA__ "foo\"bar"
Re^5: What perl operations will consume C stack space?
by hv (Parson) on Feb 27, 2006 at 15:33 UTC

    Corion answers your second point; on the first point, refactoring to /(ab+|a)+/ reduces stack usage but does not eliminate it: for me, "a" x $n cores with /(ab*)+/ at n=10080 and with /(ab+|a)+/ at n=20157, so it appears to save exactly half of the stack usage.

    As TimToady mentioned, anything that quantifies "a compound submatch of varying length" will trigger it. (In fact even "compound" does not seem required, as /(a+?)+/ attests.)

    Hugo

      On my system using 5.8.6, /(ab*){$n}/ cores with $n == 21166, whereas /(ab|a){$n}/ completes sucessfully for all values on $n upto the repetition limit of 32766. If I drop the stack reservation to 8 MB (similar to the default on Linux?), then I get a similar breakpoint of 10582.

      That seems to indicate that (OMS), the regex engine requires 792 bytes of stack for each repetition. That seems a lot of state to preserve on the stack, but I know nothing about how the regex engine is implemented, so it's probably not.

      It does make me wonder whether repetition counts, at least in these fairly simple cases, couldn't be fulfilled with by a tail recursive routine to alleviate the stack growth?

      If not, isn't there some scope for putting a check of the form die 'Not enough stack' if reps > stacksize / 792?


      Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
      Lingua non convalesco, consenesco et abolesco. -- Rule 1 has a caveat! -- Who broke the cabal?
      "Science is about questioning the status quo. Questioning authority".
      In the absence of evidence, opinion is indistinguishable from prejudice.

        The intention is to remove the C-stack recursion altogether and use perl's dynamic stacks instead. But that involves quite major surgery to the regexp engine, and I don't know when it is likely to happen.

        It does make me wonder whether repetition counts, at least in these fairly simple cases, couldn't be fulfilled with by a tail recursive routine to alleviate the stack growth?

        I don't know how you'd implement it to be tail recursive, but feel free to have a go. I suspect you'd need a quite different matching algorithm, in which case you'd probably end up needing rather more surgery than the current plan.

        If not, isn't there some scope for putting a check of the form die 'Not enough stack' if reps > stacksize / 792?

        As far as I know the stacksize isn't available within the perl process at the moment (nor more relevantly the current free stack space), and the cost per iteration may go up or down (depending on the build). If those numbers can be made available then yes, it would be a good idea to put a check in, probably by treating REG_INFTY as min(32766, freestack/stackcost).

        Hugo

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://533022]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others lurking in the Monastery: (9)
As of 2014-09-02 17:09 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    My favorite cookbook is:










    Results (25 votes), past polls