Beefy Boxes and Bandwidth Generously Provided by pair Networks
go ahead... be a heretic
 
PerlMonks  

Re^2: What perl operations will consume C stack space?

by BrowserUk (Pope)
on Feb 27, 2006 at 04:04 UTC ( #532958=note: print w/ replies, xml ) Need Help??


in reply to Re: What perl operations will consume C stack space?
in thread What perl operations will consume C stack space?

Thankyou. This was extremely helpful.

One further question. Do you know of any legitimate, non-error uses of heavily backtracking regexes that cannot be better expressed using less stack hungry varients?


Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
Lingua non convalesco, consenesco et abolesco. -- Rule 1 has a caveat! -- Who broke the cabal?
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.


Comment on Re^2: What perl operations will consume C stack space?
Re^3: What perl operations will consume C stack space?
by hv (Parson) on Feb 27, 2006 at 11:01 UTC

    Sorry, I can't answer that without fuller definitions of "legitimate" and "cannot be better expressed" - which does my previous example fall foul of?

    Any pattern that repeats a "non-simple" expression will consume C-stack space on each repetition. At least 3 of the 4 bugs linked to the metabug #24274 arose from people solving real world tasks, and there are more in the bugs database that should also be linked to the metabug - more recent ones often involve searching for particular structures in a genome.

    Here's another common fragment that invokes the problem:

    $_ = sprintf q{"%s"}, "a" x 32768; /"((?:\\.|[^"])+)"/;
    though it consumes stack at only half the rate of the /(ab*)*/ variety.

    Hugo

      I'm probably missing something, but I don't see the circumstances under which /(ab*){32766}/ couldn't be replaced by /(ab+|a){32766}/ with the same outcome (except the core dump), but much less backtracking?

      Likewise, is there anything that /"((?:\\.|[^"])+)"/ would match that /"([^"]+)"/ wouldn't?


      Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
      Lingua non convalesco, consenesco et abolesco. -- Rule 1 has a caveat! -- Who broke the cabal?
      "Science is about questioning the status quo. Questioning authority".
      In the absence of evidence, opinion is indistinguishable from prejudice.

        Your first factorisation(sp?) is valid - (ab*){32766} and (ab+|a){32766} are equivalent, as they will both match and not match exactly the same strings (modulo any deficiencies in the Perl interpreter). Your second factorization is not equivalent, as the first regex allows for backslashed items, which the second doesn't:

        #!/usr/bin/perl -w use strict; my @regexen = ( qr/"((?:\\.|[^"])+)"/, qr/"([^"]+)"/, ); for (<DATA>) { for my $r (@regexen) { print "$_\t"; print "\t$r"; if (/$r/) { print "\tMatch\t($1)\n"; } else { print "\tNo match.\n"; }; }; }; __DATA__ "foo\"bar"

        Corion answers your second point; on the first point, refactoring to /(ab+|a)+/ reduces stack usage but does not eliminate it: for me, "a" x $n cores with /(ab*)+/ at n=10080 and with /(ab+|a)+/ at n=20157, so it appears to save exactly half of the stack usage.

        As TimToady mentioned, anything that quantifies "a compound submatch of varying length" will trigger it. (In fact even "compound" does not seem required, as /(a+?)+/ attests.)

        Hugo

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://532958]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others pondering the Monastery: (15)
As of 2014-08-28 13:31 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    The best computer themed movie is:











    Results (261 votes), past polls