Beefy Boxes and Bandwidth Generously Provided by pair Networks
Welcome to the Monastery
 
PerlMonks  

Hashing Memory Usage

by awkmonk (Monk)
on Jul 12, 2006 at 15:44 UTC ( #560721=perlquestion: print w/ replies, xml ) Need Help??
awkmonk has asked for the wisdom of the Perl Monks concerning the following question:

I'll apologise now for the length of this one, but I'm in need of some serious monkery.

I've created a script that reads in several files, splitting each line into separate fields, and storing them all in a hash, to then roll around these hashes to produce an output file. All works well (for once).

The problem comes when I try and implement this on a new box. Under AIX 5.1 it all works fine, under AIX 5.3 it kills the process with 'out of memory'. Both boxes have 1GB of RAM and 2GB swap space - should be more than enough.

Using a cut down set of input files, the working box uses about 53MB of memory to run this job, the new one uses just over 1GB.

This points to altered memory usage, and indeed the build options are different between the two boxes. The trouble is that I have no idea what might cause this.

Any thoughts welcome.

UPDATE: - upgrading to 5.8.8 did indeed cure the problem On the Old box, perl -V gives:

Summary of my perl5 (revision 5.0 version 6 subversion 0) configuratio +n: Platform: osname=aix, osvers=5.0.0.0, archname=aix uname='aix shaq 1 5 006044854c00 ' config_args='-de' hint=recommended, useposix=true, d_sigaction=define usethreads=undef use5005threads=undef useithreads=undef usemultipl +icity=undf useperlio=undef d_sfio=undef uselargefiles=define use64bitint=undef use64bitall=undef uselongdouble=undef usesocks=u +ndef Compiler: cc='cc', optimize='-O', gccversion= cppflags='-D_ALL_SOURCE -D_ANSI_C_SOURCE -D_POSIX_SOURCE -qmaxmem= +16384' ccflags ='-D_ALL_SOURCE -D_ANSI_C_SOURCE -D_POSIX_SOURCE -qmaxmem= +16384 -q3' stdchar='unsigned char', d_stdstdio=define, usevfork=false intsize=4, longsize=4, ptrsize=4, doublesize=8 d_longlong=define, longlongsize=8, d_longdbl=define, longdblsize=8 ivtype='long', ivsize=4, nvtype='double', nvsize=8, Off_t='off_t', + lseeksiz8 alignbytes=8, usemymalloc=n, prototype=define Linker and Libraries: ld='ld', ldflags ='-b32' libpth=/lib /usr/lib /usr/ccs/lib libs=-lbind -lnsl -lgdbm -ldbm -ldb -ldl -lld -lm -lC -lC_r -lc -l +crypt -lbv libc=/lib/libc.a, so=a, useshrplib=false, libperl=libperl.a Dynamic Linking: dlsrc=dl_aix.xs, dlext=so, d_dlsymun=undef, ccdlflags=' -bE:/usr/ +opt/perl5' cccdlflags=' ', lddlflags='-bhalt:4 -bM:SRE -bI:$(PERL_INC)/perl.e +xp -bE:$(' Characteristics of this binary (from libperl): Compile-time options: USE_LARGE_FILES Built under aix Compiled at Nov 22 2000 08:49:49 @INC: /usr/opt/perl5/lib/5.6.0/aix /usr/opt/perl5/lib/5.6.0 /usr/opt/perl5/lib/site_perl/5.6.0/aix /usr/opt/perl5/lib/site_perl/5.6.0 /usr/opt/perl5/lib/site_perl .
On the new box:
Summary of my perl5 (revision 5.0 version 8 subversion 2) configuratio +n: Platform: osname=aix, osvers=5.2.0.0, archname=aix-thread-multi uname='aix perlfly 2 5 000ad7df4c00 ' config_args='' hint=previous, useposix=true, d_sigaction=define usethreads=define use5005threads=undef useithreads=define usemulti +plicity=de fine useperlio=define d_sfio=undef uselargefiles=define usesocks=undef use64bitint=undef use64bitall=undef uselongdouble=undef usemymalloc=n, bincompat5005=undef Compiler: cc='cc_r', ccflags ='-D_ALL_SOURCE -D_ANSI_C_SOURCE -D_POSIX_SOURC +E -qmaxmem =16384 -qnoansialias -DUSE_NATIVE_DLOPEN -DNEED_PTHREAD_INIT -q32 -D_L +ARGE_FILES -qlonglong', optimize='-O', cppflags='-D_ALL_SOURCE -D_ANSI_C_SOURCE -D_POSIX_SOURCE -qmaxmem= +16384 -qno ansialias -DUSE_NATIVE_DLOPEN -DNEED_PTHREAD_INIT -D_ALL_SOURCE -D_ANS +I_C_SOURCE -D_POSIX_SOURCE -qmaxmem=16384 -qnoansialias -DUSE_NATIVE_DLOPEN -DNE +ED_PTHREAD _INIT -q32 -D_LARGE_FILES -qlonglong -D_ALL_SOURCE -D_ANSI_C_SOURCE -D +_POSIX_SOU RCE -qmaxmem=16384 -qnoansialias -DUSE_NATIVE_DLOPEN -DNEED_PTHREAD_IN +IT -q32 -D _LARGE_FILES -qlonglong -D_ALL_SOURCE -D_ANSI_C_SOURCE -D_POSIX_SOURCE + -qmaxmem= 16384 -qnoansialias -DUSE_NATIVE_DLOPEN -DNEED_PTHREAD_INIT -q32 -D_LA +RGE_FILES -qlonglong -D_ALL_SOURCE -D_ANSI_C_SOURCE -D_POSIX_SOURCE -qmaxmem=163 +84 -qnoans ialias -DUSE_NATIVE_DLOPEN -DNEED_PTHREAD_INIT -q32 -D_LARGE_FILES -ql +onglong -D _ALL_SOURCE -D_ANSI_C_SOURCE -D_POSIX_SOURCE -qmaxmem=16384 -qnoansial +ias -DUSE_ NATIVE_DLOPEN -DNEED_PTHREAD_INIT -q32 -D_LARGE_FILES -qlonglong -D_AL +L_SOURCE - D_ANSI_C_SOURCE -D_POSIX_SOURCE -qmaxmem=16384 -qnoansialias -DUSE_NAT +IVE_DLOPEN -DNEED_PTHREAD_INIT -q32 -D_LARGE_FILES -qlonglong' ccversion='', gccversion='', gccosandvers='' intsize=4, longsize=4, ptrsize=4, doublesize=8, byteorder=4321 d_longlong=define, longlongsize=8, d_longdbl=define, longdblsize=8 ivtype='long', ivsize=4, nvtype='double', nvsize=8, Off_t='off_t', + lseeksize =8 alignbytes=8, prototype=define Linker and Libraries: ld='ld', ldflags =' -brtl -b32 -bmaxdata:0x80000000' libpth=/lib /usr/lib /usr/ccs/lib libs=-lbind -lnsl -ldbm -ldl -lld -lm -lpthreads -lc_r -lcrypt -lb +sd -lPW perllibs=-lbind -lnsl -ldl -lld -lm -lpthreads -lc_r -lcrypt -lbsd + -lPW libc=/lib/libc.a, so=a, useshrplib=true, libperl=libperl.a gnulibc_version='' Dynamic Linking: dlsrc=dl_aix.xs, dlext=so, d_dlsymun=undef, ccdlflags='-bE:/usr/op +t/perl5/li b/5.8.2/aix-thread-multi/CORE/perl.exp -bE:/usr/opt/perl5/lib/5.8.2/ai +x-thread-m ulti/CORE/perl.exp -bE:/usr/opt/perl5/lib/5.8.2/aix-thread-multi/CORE/ +perl.exp - bE:/usr/opt/perl5/lib/5.8.2/aix-thread-multi/CORE/perl.exp' cccdlflags=' ', lddlflags='-bhalt:4 -bM:SRE -bI:$(PERL_INC)/perl.e +xp -bE:$(B ASEEXT).exp -bnoentry -lpthreads -lc_r' Characteristics of this binary (from libperl): Compile-time options: MULTIPLICITY USE_ITHREADS USE_LARGE_FILES PERL +_IMPLICIT_ CONTEXT Built under aix Compiled at Feb 13 2004 13:18:17 @INC: /usr/opt/perl5/lib/5.8.2/aix-thread-multi /usr/opt/perl5/lib/5.8.2 /usr/opt/perl5/lib/site_perl/5.8.2/aix-thread-multi /usr/opt/perl5/lib/site_perl/5.8.2 /usr/opt/perl5/lib/site_perl .

'I think the problem lies in the fact that your data doesn't fit my program'.

Comment on Hashing Memory Usage
Select or Download Code
Re: Hashing Memory Usage
by Fletch (Chancellor) on Jul 12, 2006 at 15:50 UTC

    Without seeing the code in question (or at least something stripped down that produces similar bloat under the different perls) I don't know if you're going to get a good response.

    Having said that, one alternative when you start running out of RAM when processing a hash is to start tossing the data into something on disk using BerkeleyDB or the like. You'll lose some speed but you shouldn't hit the same memory wall.

      Ah, good point. This msaaivley cut down version still produces the same bloat.

      Rewriting the code is an option, but not one I'd like to go into unless there is no other hope.

      #!/usr/bin/perl -w use strict; my %a = (); my $res = `ps v $$`; print "$res\n"; for my $line ( 1 .. 19000 ){ for ( "AA" .. "DZ" ){ $a{$line}{"$_$line"} = $line; } } $res = `ps v $$`; print "$res\n"; exit 0;

      'I think the problem lies in the fact that your data doesn't fit my program'.

        Maybe with your cut down script, Devel::Size could provide some insight (at least give you a diff on the two architectures between the structure size and the structure+data size).

        -derby
Re: Hashing Memory Usage
by kwaping (Priest) on Jul 12, 2006 at 15:53 UTC
    I notice that your new box is using threaded perl while the old box isn't. That might be something to explore.

    Otherwise, please post some code, as Fletch noted. Is the code identical between the two boxes, or did you make any changes going from old to new?

    ---
    It's all fine and dandy until someone has to look at the code.
      The code is identical. We're trying to prove that everything works exactly as it did, before we migrate over.

      Any thoughts on this being to do with 32bit v 64bit architecture?


      'I think the problem lies in the fact that your data doesn't fit my program'.

        64 bit is certainly going to use more memory. Your integers will be bigger. Also, you're using 5.8.2 on the new box and 5.6 on the old. That is likely to make a difference.
Re: Hashing Memory Usage
by traveler (Parson) on Jul 12, 2006 at 16:25 UTC
    It would be most helpful if either (or both) were compiled with PERL_DEBUG_MSTATS or DEBUGGING. The former gives access to Devel::Peek's mstat(); while the latter enables -DL and the warn("!") stuff. Both allow in-program examination of memory usage. If you can rebuild, that might help.

    HTH, --traveler

Re: Hashing Memory Usage
by nothingmuch (Priest) on Jul 12, 2006 at 16:32 UTC
    IIRC perl 5.8 changed the hash function... It may be that it's performing differently for your key set in such a way that more buckets are needed to store the values (arguably a good thing), to the point where Perl cannot allocate a contiguous chunk of memory for the bucket array (it probably needs to do that but I'm not sure).

    Another problem could be that the new machine is shipping with soft/hard ulimits set differently by default. Check ulimit on the command line before running. Tweaking the hard limits may require a kernel recompilation.

    Good luck!

    -nuffin
    zz zZ Z Z #!perl
Re: Hashing Memory Usage
by mrd (Beadle) on Jul 12, 2006 at 18:49 UTC
    I think this has nothing to do with perl.

    You might want to compare your users memory quotas on those mashines (you might need a sysadmin for that).

    Also, you might check what applications are running on those mashines. It just might be that there is another app that eats a lot of memory on the AIX 5.3.

    HTH.

      Also take a look at any ulimit settings, or anything else that may limit how much memory is available to a process. Can you see if the new process is actually using more memory or running into other limitations?
        I've hijacked a friendly sysadmin bloke - the ulimits are set the same way, there was no other process running on either box during these tests. I've just stumbled into a big problem though - I don't think I'm going to be able to recompile the Perl without major internal wranglings over support on the box. Arrrrrrgh.

        'I think the problem lies in the fact that your data doesn't fit my program'.

Re: Hashing Memory Usage
by vhold (Beadle) on Jul 12, 2006 at 19:56 UTC
    I think I ran into this a long time ago on AIX 32-bit compiled programs.

    Check out this page: Large Program Support

    Basically give this a shot, make a copy of your 32 bit perl, and do: /usr/ccs/bin/ldedit -bmaxdata:0x80000000/dsa perl
Re: Hashing Memory Usage
by shmem (Canon) on Jul 13, 2006 at 06:49 UTC

    As an irish joke goes: "could you tell me the way to tipperary?" - "well, I wouldn't start from here".

    I don't have an idea either, but I see too many changes to identify "the guilty party": OS release change, hardware change, perl release change. To make shure it isn't (or is) perl's fault I would install the same version of perl (5.8.2) on the old box, run the program and start from there.

    --shmem

    _($_=" "x(1<<5)."?\n".q/)Oo.  G\        /
                                  /\_/(q    /
    ----------------------------  \__(m.====.(_("always off the crowd"))."
    ");sub _{s./.($e="'Itrs `mnsgdq Gdbj O`qkdq")=~y/"-y/#-z/;$e.e && print}
      The problem is that the old box is the current production machine. No-one would let me go anywhere near changing that. It's beginning to look like I either need new Perl binaries, which they just might let me load, or I re-write my routine.

      'I think the problem lies in the fact that your data doesn't fit my program'.

        Then do it the other way round. Build a 5.6.0 perl on the new box, and test that.

        --shmem

        _($_=" "x(1<<5)."?\n".q/)Oo.  G\        /
                                      /\_/(q    /
        ----------------------------  \__(m.====.(_("always off the crowd"))."
        ");sub _{s./.($e="'Itrs `mnsgdq Gdbj O`qkdq")=~y/"-y/#-z/;$e.e && print}
        Compile perl with a prefix, and store it in your home directory, where nobody else can touch it. It should be safe.
        -nuffin
        zz zZ Z Z #!perl
Re: Hashing Memory Usage
by freakingwildchild (Scribe) on Jul 14, 2006 at 08:36 UTC
    I've had similar problems and this inbetween 5.6.1 and 5.8.7.

    My program was sucking up double as much memory and this only because of differences in Perl versions.
    Same compilation flags, same way of installing.

      I've come to that conclusion myself. The keys to my main hash are stupidly big, so as a quick fix I'm writing them to disk as individual files, containing just a unique number. I'm then using that number as the key. It has reduced the memory usage, but not by as much as I'd hoped.

      Thanks one and all for your help on this.


      'I think the problem lies in the fact that your data doesn't fit my program'.

        If you're dropping 256MB core files (the segment size in AIX), then you're essentially growing past the segment size and the -bmaxdata flag should fix the issue. I've been experiencing this recently and have not yet recompiled, but I am making many efficiency changes (true iterators have come in very handy) to combat the problem right now.
Re: Hashing Memory Usage
by skyknight (Hermit) on Jul 16, 2006 at 18:09 UTC
    It sounds like you might want something called a "database". On another random note, maybe the Perl garbage collector never runs in the case where it gets to 1GB. For the one that stays at 53MB, do you cycle through 1GB worth of data but perhaps manage to reuse memory? Sometimes garbage collectors are sloppy when they aren't pressed up against the wall.
      Ah a red herring - it's not hashes at all! I'm changing values in a large hash as I loop through files. The problem occurs when I come to create the key for the hash. The following code works properly on 5.6 but grows on 5.8. Delete the substitution line, and hey presto, no more leak.
      use strict; my $ret = `ps v $$`; print "$ret\n"; for ( 1 .. 1000 ){ my $key = ':PERSON-NUM(1:*):NI-NUM(1:*):'; for my $bb ( "PERSON-NUM", "NI-NUM" ){ if ( $key =~ /:$bb *\( *([0-9]+) *: *([0-9\*]+) *\) *:/ ){ $key =~ s/:$bb *\( *[0-9]+ *: *[0-9\*]+ *\) *:/:$bb=1:/; } } } $ret = `ps v $$`; print "$ret\n";
      Now I'm confused!

      'I think the problem lies in the fact that your data doesn't fit my program'.

        I ran your code with the loop iterator set 1e6, and see no memory growth under either 5.8.6 or 5.8.8.

        I seem to recall that 5.8.2 (the version shown in your OP), was a particularly buggy, short-lived release. I strongly suggest you try upgrading to something newer. My memory tells me that 5.8.3 was a pretty good build as is 5.8.6 which I still use as my default install. I encountered a few problems with 5.8.8 but I think they may have been limited to the AS-Win distribution I use.


        Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
        Lingua non convalesco, consenesco et abolesco. -- Rule 1 has a caveat! -- Who broke the cabal?
        "Science is about questioning the status quo. Questioning authority".
        In the absence of evidence, opinion is indistinguishable from prejudice.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://560721]
Approved by Paladin
Front-paged by Roy Johnson
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others drinking their drinks and smoking their pipes about the Monastery: (8)
As of 2014-10-25 06:42 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    For retirement, I am banking on:










    Results (142 votes), past polls