Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl Monk, Perl Meditation
 
PerlMonks  

How to Accurately Determine Amount of RAM

by hepcat72 (Sexton)
on Jul 08, 2014 at 16:47 UTC ( [id://1092755]=perlquestion: print w/replies, xml ) Need Help??

hepcat72 has asked for the wisdom of the Perl Monks concerning the following question:

Hi,

I have a perl script that breaks up an input file for optimal performance of a system call I'm making (a multiple sequence aligner called Muscle). I've structured it to take advantage of multiple cores and it works well. However, there are some files that are so large, even with the division, the system runs out of memory. So in addition to breaking up the data per core, I need to break it up and run it in batches. I want this script to work on any computer, so I set out to find a module that would return the amount of physical RAM. I found Sys::MemInfo, but when I tested it out, it told me I have 4 gigs when in fact I have 16:
>perl -e 'use Sys::MemInfo qw(totalmem);print "total memory: ".(&total +mem / 1024 / 1024 / 1024)."\n";' total memory: 3.99408721923828 >system_profiler | grep " Memory:" Memory: 16 GB
Is there another method out there that is system independent and more accurate?

Thanks,
Rob

Replies are listed 'Best First'.
Re: How to Accurately Determine Amount of RAM
by perlfan (Vicar) on Jul 08, 2014 at 17:09 UTC
    uname -a ? Mas OS/system info?

    I ask because there are 32-bit OSes/kernels that have what is called "Physical Address Extension" (PAE - http://en.wikipedia.org/wiki/Physical_Address_Extension) for 32-bit machines that allow the support for more than 4 GB system wide; but it's still bound to 4 GB per 32-bit process.

Re: How to Accurately Determine Amount of RAM
by AppleFritter (Vicar) on Jul 08, 2014 at 18:08 UTC

    I've just tried it on the Windows box I'm currently on -- Sys::MemInfo works for me:

    C:\strawberry>perl -MSys::MemInfo -e "print Sys::MemInfo::totalmem" 17118961664 C:\strawberry>

    You're using a Mac, I take it? I have a gut feeling that the way the module is gathering its information (in darwin.xs) is wrong; Darwin/OS X has a sysctl for determining physical memory (CTL_HW and HW_MEMSIZE).

    I see you've already filed a bug for this issue, but given that the module's last release is from 2006, who knows if or when it will be fixed. If you don't mind getting into XS programming, you could perhaps do it yourself, and maybe the fixed version could then be put on CPAN, too. (I don't know what the process is for "taking over" abandoned modules.)

    That said, I agree with talexb: perhaps the proper question to ask is not "how do I reliably determine the actual amount of memory on this machine" as much as "how do I best accomplish this task". Do you even need to split your data into ~16 GB chunks for processing? Will using e.g. 4 GB chunks be significantly slower? For that matter, maybe there's a better way of doing what you want to do in the first place.

      Well, I didn't want to get into XS stuff, and guessing that Mac is the only exception, I just decided to check $ENV{OSTYPE} and if it's Darwin, call `sysctl -n hw.memsize`. Otherwise, I just use the module.

      Now I just have to write some code to run my concurrent child processes in batches.

      Thanks Guys.
Re: How to Accurately Determine Amount of RAM
by talexb (Chancellor) on Jul 08, 2014 at 17:28 UTC

    Hi Rob,

    This sounds like an XY Problem to me. Rather than looking into how much memory there is, why not look into the issue of why you're using so much memory?

    At a guess, I'd say you are reading the entire file into memory -- and when that exhausts available memory, your script fails. A better approach would be to read the file in a line (or chunk) at a time. Does that help at all?

    Alex / talexb / Toronto

    Thanks PJ. We owe you so much. Groklaw -- RIP -- 2003 to 2013.

      Hi Alex,

      It's not my program/code that is consuming the memory. It's a system call to a C program called muscle. My script comes nowhere near the memory requirements that muscle does. My script uses a simple hash strategy to find sequences (that are similar to one another and likely to yield positive results) to decide which sets of sequences to send together in a call to muscle. An exhaustive approach which takes very little memory would be to run muscle for every likely pair. I have code that does this and it takes a few orders of magnitude longer to do it that way than to run them all together and parse the results. However, muscle runs faster on large groups of sequences than it does for a full ("likely") pairwise comparison. But if I group too many of the sequences together, I hit memory limitations and things start crashing. So I would like to be able to compute a maximum number of sequences to run per gig of memory and break up my analysis as much as it takes to meet that limit so that I get the best performance.

      Rob

        OK -- I wasn't sure about your situation.

        So then I guess you have to see about maximizing the number of pairs without running out of memory. That sounds like an interesting challenge.

        Alex / talexb / Toronto

        Thanks PJ. We owe you so much. Groklaw -- RIP -- 2003 to 2013.

Re: How to Accurately Determine Amount of RAM
by sundialsvc4 (Abbot) on Jul 08, 2014 at 19:55 UTC

    Honestly, it might be preferable to design your script to take a command-line parameter, e.g. -m integerG, which specifies at runtime the amount of memory that you want the program to take as its maximum.

    In addition to being a convenient cop-out ... ;-) ... this simple-minded-sounding approach actually adds flexibility:   If, say, at some point you want to run four instances of your program concurrently on a 16-gig machine, you would want to tell each of them, say, -m 3G.   Absent the parameter, the program would assume a fixed limit of your arbitrary choosing.   (“Assume that you can have it all” might or might not be a great assumption to make or to have made, down the line, even if it looks prescient now.)   Instead of making a guess that might turn out to be wrong, using logic that might have to be programmatically tweaked at an inconvenient wee hour of the early morning moment, it would make a conservative assumption that you know would be okay.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://1092755]
Front-paged by Arunbear
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others meditating upon the Monastery: (8)
As of 2024-04-23 08:05 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found