Beefy Boxes and Bandwidth Generously Provided by pair Networks
Pathologically Eclectic Rubbish Lister
 
PerlMonks  

Optimizing Memory consumption

by PerlingTheUK (Hermit)
on Nov 09, 2006 at 20:19 UTC ( [id://583208]=perlquestion: print w/replies, xml ) Need Help??

PerlingTheUK has asked for the wisdom of the Perl Monks concerning the following question:

Dear Monks,
I have been maintaining quite a big object oriented application for three years now. This application has probably a person year of development under its hood. While everything was fine for a long time, suddenly the requirements have changed. The data to be stored has increasd by large and while I had several cases where the memory consumption was closing in on the half a gigabyte mark, I have some test cases where it exceeds 2 Gbyte and subsequently crashes. The data used in this applciation is in a text file and merely 200 MByte large. While I am interested in maintaining the structure of this application.
I would like to revisit my classes and make the data storage more memory efficient. Ideally this does not mean to revert to I/O but instead to store/access data differently. I seem to have heard that a scalar requires at least 32 byte in Perl. As a large number of my values is boolean or very small intergers I am considering using a central scalar with an integer value that can be accessed as a bitfield. Similarly several short strings could be joint to one string and accessed using substring or unpack.
Before I can analyze which way to follow and how small or big the possible benefit is, I need to understand how much memory is used for what in perl. I. e. there is no value optimizing the data storage if the real bottleneck is caused by the amount of accessors to the objects values. I have googled for sources and hints how to reduce memory consumption for several days now but so far I have not found any sensible solutions. Any help is greatly appreciated.

Cheers,
PerlingTheUK

Replies are listed 'Best First'.
Re: Optimizing Memory consumption
by Corion (Patriarch) on Nov 09, 2006 at 20:29 UTC

    Devel::Size will tell you the size of the Perl variables.

    I think your idea of storing the boolean values or integer values of an object together in a string is sensible. If the size gains make it worthwhile and you still want fast access, you can also stuff all your objects into one string/array, for example by using Tie::Array::PackedC, but modifying these values becomes ugly.

    If you have a large part of your needed memory in a hash, you can always "just" tie that hash to DB_File or one of the other btree implementations. Maybe you also can lazy-load some data or unload some data when it's not needed, but as all of that needs modification of the program, I'd go for one of the tie solutions that allow your program to run slower but otherwise unchanged.

Re: Optimizing Memory consumption
by xdg (Monsignor) on Nov 09, 2006 at 20:29 UTC
    I need to understand how much memory is used for what in perl

    Look at Devel::Size.

    To the larger question, I've had some luck optimizing a large data structure by storing records using pack/unpack. I.e. rather than a hash-of-hash, I switched to hash-of-packed-array. In this case, all the data elements were just integers, so they packed down pretty well. On any particular run, I only needed some of them, so the cost of unpacking the ones I needed to access was small relative to the savings of keeping the entire data set in memory. (Plus, I could save/load the entire packed structure using Storable, too.)

    -xdg

    Code written by xdg and posted on PerlMonks is public domain. It is provided as is with no warranties, express or implied, of any kind. Posted code may not have been tested. Use of posted code is at your own risk.

Re: Optimizing Memory consumption
by davido (Cardinal) on Nov 10, 2006 at 01:00 UTC

    Using bit fields instead of individual scalars to represent boolean data will save you a measure of memory consumption, but the same architecture that is getting you into trouble now will get you into trouble in the future as your needs grow again.

    I consider moving from individual scalars to bit fields a micro-optimization. But you're not micro-challenged; you've got a design issue.

    It may be easier to deal with all this than you think. Perhaps you could create helper classes to handle the behind-the-scenes storage and retrieval so that the majority of your script can remain unchanged. ...without seeing the details my suggestion has to be vague.


    Dave

Re: Optimizing Memory consumption
by brig (Scribe) on Nov 10, 2006 at 00:02 UTC

    This isn't answering the question you asked; Mostly because I'm not convinced you are asking the right question.

    My Rule of Thumb is that anytime there is a significant change to requirements, there needs to be a significant review of the architecture. I would say that a quadrupling of your memory requirements and the fact that you are using a 200MB text file confirms this.

    There isn't really enough info about your situation to come to any realistic conclusion because you have already decided to optimize (fair enough). However, when I see a monstrous text file, I always wonder if the use of a Relational DB and clever queries would generate a more efficient solution. You mention that "the data to be stored has increased by (a?) large (margin?)" this is often a point at which data abstraction needs to be reviewed.

    Finally, please forgive me if I am off base.

    (update: recomposed 1 sentence.)

    Love,
    Brig

Re: Optimizing Memory consumption
by perrin (Chancellor) on Nov 09, 2006 at 21:25 UTC
    In a similar situation, I saved memory by avoiding unintentional auto-vivification of nested hash structures. That helped a lot, since checking if $foo->{bar}->{baz}->[0] is true will create a hash and an array if they didn't exist already. Careful use of exists() can avoid this. Ultimately though, I had to switch to a more sensible system that didn't try to load all the data into RAM at once.
Re: Optimizing Memory consumption
by dave_the_m (Monsignor) on Nov 10, 2006 at 00:47 UTC
    I am considering using a central scalar with an integer value that can be accessed as a bitfield
    In that case, you may find vec useful.

    In perl 5.8.x on a 32-bit platform, a scalar holding an integer will typically use 16 bytes; a float 20; a string 28 + length of string; an array 52; an array slot 4; a hash 60; a hash slot (24 to 48) + length of key.

    Dave.

Re: Optimizing Memory consumption
by jfroebe (Parson) on Nov 09, 2006 at 20:29 UTC

    Hi,

    That's definitely pretty vague. How much memory perl uses varies a great deal (like any other language) depending on what you do with it, what platform you're running it on, and what version of perl and any supporting libraries (os libs for example).

    I don't think the problem is with perl itself, though I could be wrong. Probably the over zealous memory consumption is how you are accessing the data files. Are you reading the file entirely into memory or are you accessing it piecemeal?

    Can you provide some example code of how you're accessing the file?

    Jason L. Froebe

    Team Sybase member

    No one has seen what you have seen, and until that happens, we're all going to think that you're nuts. - Jack O'Neil, Stargate SG-1

Re: Optimizing Memory consumption
by talexb (Chancellor) on Nov 10, 2006 at 01:34 UTC

    I haven't seen any mention of putting this 200M of data into a database. Is that a possibility?

    Alex / talexb / Toronto

    "Groklaw is the open-source mentality applied to legal research" ~ Linus Torvalds

Re: Optimizing Memory consumption
by zentara (Archbishop) on Nov 10, 2006 at 12:45 UTC
    DBM::Deep has something in it's README
    REAL-TIME COMPRESSION EXAMPLE Here is a working example that uses the *Compress::Zlib* module to do real-time compression / decompression of keys & values with DBM::Deep Filters. Please visit <http://search.cpan.org/search?module=Compress::Zlib> for more on *Compress::Zlib*. use DBM::Deep; use Compress::Zlib; my $db = new DBM::Deep( file => "foo-compress.db", filter_store_key => \&my_compress, filter_store_value => \&my_compress, filter_fetch_key => \&my_decompress, filter_fetch_value => \&my_decompress, );

    I'm not really a human, but I play one on earth. Cogito ergo sum a bum
Re: Optimizing Memory consumption
by mkirank (Chaplain) on Nov 11, 2006 at 15:53 UTC
    If your data structures are of similar kind you could use Data::Reuse to reduce the memory
Re: Optimizing Memory consumption
by sandfly (Beadle) on Nov 13, 2006 at 23:29 UTC
    I agree with most of the earlier posts, that you may have a design issue. But, there's a nice, crude alternative which may be of interest: where I work, we use an ActiveState build of Perl on Solaris. I assume it's a 64-bit build, because we can and do run scripts with memory usage over 2 GB; the same scripts run out of memory on Linux.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://583208]
Approved by Old_Gray_Bear
Front-paged by Corion
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others avoiding work at the Monastery: (2)
As of 2024-04-25 05:45 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found