Beefy Boxes and Bandwidth Generously Provided by pair Networks
Pathologically Eclectic Rubbish Lister
 
PerlMonks  

Prevent "Out of Memory" error

by tford (Beadle)
on Nov 23, 2009 at 02:20 UTC ( #808717=perlquestion: print w/ replies, xml ) Need Help??
tford has asked for the wisdom of the Perl Monks concerning the following question:

Hello all, I have a written a "chart parser" for Math notation, and I have written it entirely in Perl.

It works by building tree structures corresponding to every valid substring of the input, and storing them in memory. Then later on, it may combine some of the subtrees to make larger trees, which also need to be stored.

The problem is that a malicious user (or just a very ignorant one) can enter inputs that are so ambiguous that they will cause a great many trees to be built during the course of the parse.

It is possible to run the parser out of memory this way, and indeed it has already happened.

Another thing that makes this a difficult problem is that the "Out of Memory" error can occur at different places in the code. Sometimes when it tries to construct a new "Node" object. Sometimes, if it's already running low on memory, the error will happen when it's just adding an element to a pre-existing hash.

My question is, "Is there a way to monitor the free memory available from within the program?"

I've looked a little bit at Devel::Peek, and it seems promising, but I figured I'd check with you guys first. Surely someone else has run across this same issue!

Barring that, I'm thinking I may have to use the C programming language to rewrite my own versions of each basic data structure (array, hash, node), as XSUBs. Then they each could have a "safety valve" which would use malloc to check and see if there's actually enough memory to create or extend the given data. If not, the function could return an error value which would make the parsing stop, and send a message to the user.

One thing I've already tried is simply putting a limit on the amount of "Node" objects that can be created during parsing, but this is not a real solution. Apparently the amount of free memory for the process can vary depending on various factors.

Any help will be greatly appreciated!

~tford

Comment on Prevent "Out of Memory" error
Re: Prevent "Out of Memory" error
by edeca (Acolyte) on Nov 23, 2009 at 12:59 UTC
    You can inspect individual objects with Devel::Peek but you might find GTop more useful. If installed on a system that has libgtop, it can report the total memory for any process. Calling it like so:
    use Gtop; my $gtop = GTop->new(); my $mem = $gtop->proc_mem($$); # Now you can use $mem->share / $mem->rss / $mem->vsize
    You could monitor these before creating new objects and check if your script is using more than a "sane" amount and quit if the threshold is reached.
Re: Prevent "Out of Memory" error
by zentara (Archbishop) on Nov 23, 2009 at 13:29 UTC
    ... a simple hack which i would try, is to run a thread that monitors top or ps for your pid, then internally throttle back your main app by sleeping for some microseconds, or some put delay in the node creation loop

    ... see linux memory leak monitor for a Tk memory monitor, from which you can extract the memory code..... which reads it right from /proc.... it's just a hack, because it won't adjust for varying system loads, but maybe you could parse outputs of mem, and top, then use some internal logic to introduce more microsecond delays in the main thread....you may even discover a new algorithm which you could sell to google..

    ...call it Gyro...... a dynamic top :-) .... a differential equation between mem and system load


    I'm not really a human, but I play one on earth.
    Old Perl Programmer Haiku
Re: Prevent "Out of Memory" error
by biohisham (Priest) on Nov 23, 2009 at 13:38 UTC
    "a malicious user (or just a very ignorant one) can enter inputs that are so ambiguous"

    Have you considered introducing restrictions on what/how a user can enter input by ruling out which input is considered valid and which one is considered otherwise?.

    "the error will happen when it's just adding an element to a pre-existing hash"

    show a more detailed code example that represents this problem, without that example it is just not easy to pinpoint where the issues lie and we'd be only trying to make informed guesses, but they just remain 'guesses' until proven..



    Excellence is an Endeavor of Persistence. Chance Favors a Prepared Mind.
Re: Prevent "Out of Memory" error
by JadeNB (Chaplain) on Nov 23, 2009 at 16:16 UTC
    It works by building tree structures corresponding to every valid substring of the input, and storing them in memory. Then later on, it may combine some of the subtrees to make larger trees, which also need to be stored.
    I think that all of us are instinctively terrified by the necessarily exponential behaviour of parsing every substring. Is this really necessary? As biohisham mentioned, limiting the ambiguity even in very small ways can help a lot—for example, think of how much natural-language freedom is allowed in Perl programs, which can still be parsed with relatively little memory overhead (I think!).

      You may be right that the un-adulterated algorithm is exponential in memory requirements, but it's classified as O(n^3) for time complexity.

      It sounds like what many people are saying is basically "make it more efficient." Well I did, by using "shared subtrees" to consolidate memory usage (which helped a lot) and also a technique called "top-down filtering" which helps it to avoid the unnecessary work that JadeNB pointed out above.

      Now it is about 3 times faster and uses about 1/10 as much memory, but the problem theoretically still exists.

      It just doesn't seem right that Perl would allow you to keep throwing memory at a problem until it crashes, without allowing some way to trap for it.

      I'll see if I can get a super-simplified model of the grammar and parser, and show an example that we can all look at. Basically, I'm pretty sure there's no good way to tell the complexity of parsing a given input without actually parsing it first.

      Until then, thanks for all the help guys!

      Oh by the way, I do realize that the parser used in the perl compiler is pretty exceptional, as far as the flexibility it allows, and the speed with which it parses. However, Perl is still a programming language, and the Perl parser makes decisions to ensure that in the end, the language that it accepts is unambiguous. Plus, it's written in C.

      My particular application requires that we "never pick the wrong interpretation" for any given input. That's why I think it's necessary to have it come up with all valid parses.

Re: Prevent "Out of Memory" error
by JadeNB (Chaplain) on Nov 23, 2009 at 22:58 UTC
    A bit of Googling turned up $^M, and a bit further Googling turned up Is $^M a leftover April Fool?. Maybe the techniques described in the latter post could be used to give you a chance at least to die gracefully, or possibly even to abort the calculation?
Re: Prevent "Out of Memory" error
by Anonymous Monk on Nov 24, 2009 at 02:53 UTC
    As a debugging tool you could try Data::Structure::Util, which at the very least will ensure you're not accidentally creating circular references.

    Otherwise I would continue down the path of input validation and improving your algorithms. Fiddling with XS is unlikely to help you here and sounds hackish.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://808717]
Approved by keszler
Front-paged by Old_Gray_Bear
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others chilling in the Monastery: (10)
As of 2014-12-19 09:21 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    Is guessing a good strategy for surviving in the IT business?





    Results (75 votes), past polls