Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl: the Markov chain saw
 
PerlMonks  

Re: Re: Re: Optimising processing for large data files.

by tilly (Archbishop)
on Apr 10, 2004 at 19:51 UTC ( #344162=note: print w/ replies, xml ) Need Help??


in reply to Re: Re: Optimising processing for large data files.
in thread Optimising processing for large data files.

I knew this, and no, I don't consider it true GC. (Circular references cause you to leak memory at runtime.)

It certainly wasn't a GC algorithm in the sense that BrowserUK was referring to, something that would randomly halt your program while it went through a garbage collection phase.


Comment on Re: Re: Re: Optimising processing for large data files.
Re: Re: Re: Re: Optimising processing for large data files.
by BrowserUk (Pope) on Apr 10, 2004 at 23:09 UTC

    Okay. If anyone with perl v5.8.2 (AS 808) running under XP (or similar configuration) is following this discussion, could they please run the following code under these conditions.

    1. Download the code below and save as "buk.pl".
    2. Create a datafile of say 30MB. It doesn;t matter what it contains.
    3. Start the task manager and configure it with:
      1. Click the "Performance" tab and note how much Available Physical Memory your system has.
      2. Click the "Processes" tab.
      3. View->select columns...

        Ensure that "Memory usage", "Memory Usage Delta" & "Virtual Memory Size" columns are all checked.

      4. Ensure that all 3 columns are visible (preferably next to each other by temporarially unchecking any intermediate ones.
      5. Check View->Update speed->High.
      6. Check Options->Always on top.
      7. Adjust the task manager window to a convenient size and position so that you can monitor it whilst running the code.
      8. Click the "CPU" column header a couple of times to ensure that the display is sorted by cpu usage in descending order.
    4. Switch to a command line and run the program.

      buk datafile

    Watch the 3 memory columns for perl.exe (should become the top item if you followed the above directins and don't have any other cpu intensive processes running) as the program runs.

    Watch carefully, and note how the "Mem Usage" figure steadily rises for a short period before suddenly dropping back.

    The "Mem Delta figure will become negative (the value displayed in braces) each time the "Mem usage" figure falls back.

    Note that the "VM Size" value tracks the "Mem Usage" closely whilst being slightly larger, and grows steadily for a short period before falling back in step with "Mem Usage".

    Note that each time it falls back it doesn't fall as far as it grew, resulting in an overall steady increase in the memory usage.

    Note that the frequency and size of the fallbacks seems to grow ever larger, and more frequent with time.

    Once you have seen enough, ^C the program.

    Don't allow the "Mem Usage" value to approach the "Physical Memory Available" figure as by then you will have moved into swapping and the picture becomes confused as the OS starts swapping memory from other processors to disk and all the Mem Delta figures start showing up (negative)decreases.

    I'd be really grateful if at least one other person could confirm that they too see the behaviour described.

    #! perl -slw use strict; my @cache; open( FH, '< :raw', $ARGV[ 0 ]) or die $!; while( <FH> ) { push @cache, split '', $_; my $pair = shift( @cache ) . $cache[ 499 ] for 0 .. $#cache - 500; } close FH;

    Assuming that this behaviour isn't a figment of my imagination and is confirmed by other(s), then if anyone has a better explaination of the (temporary, but often substantial) reductions in perl.exe's memory usage, other than Perl periodically freeing heap memory back to the OS as part of some "garbage collection like" process, I'm ready to eat my hat and apologise for misleading the monks.


    Examine what is said, not who speaks.
    "Efficiency is intelligent laziness." -David Dunham
    "Think for yourself!" - Abigail
      I guarantee the following statements.
      1. With Perl 5.8.2 running on Linux 2.4, I saw no such behaviour.
      2. Perl's documentation is readily available, and clearly documents that Perl does reference counting.
      3. Perl has reliable destructor behaviour, which no efficient true GC algorithm that I have seen offers.
      4. Parrot does use true GC, and there have been more than a few debates about what to do about reliable destructor behaviour with that switch.
      5. Perl's source-code is readily available, the SV struct has a field known as sv_refcnt, with macros like SvREFCNT_inc and SvREFCNT_dec to manipulate it. This is the reference counting mechanism.
      6. Perl has support to actually try to return unused memory to the OS where the OS supports that. I don't know the details of that support, but I've seen chip point it out, and I trust that he knows what he is talking about.
      7. I do not know how reliable Windows profiling tools are, or how the above-mentioned support for returning memory to the OS actually works. It is not impossible that what you describe has something to do with how Windows handles that interaction, or with how the profiling tools track stuff. I venture no guesses on how likely that is.
      As a result of them I maintain my stated position. For good or bad, Perl 5 does NOT have a garbage collector that stops the program in the middle of running and proceeds to detect garbage.

        There you go again. Take two of my words, add 2 dozen of your own, mix together into an statement I never made--and then attack.

        You didn't see it. The rest is totally irrelavent.


        Examine what is said, not who speaks.
        "Efficiency is intelligent laziness." -David Dunham
        "Think for yourself!" - Abigail
      It would be interesting to try a few variations on this code, such as not using lexical variables, to isolate the cause of this behavior. Another source of memory strangeness could be the file system.

      If you look in the win32 directory of the Perl source code, you will see that there are several ways to change the way that perl allocates and frees memory, in 5.8.0, at least. Here's a snippet from vmem.h:

      /* vmem.h * * (c) 1999 Microsoft Corporation. All rights reserved. * Portions (c) 1999 ActiveState Tool Corp, http://www.ActiveState.com +/ * * You may distribute under the terms of either the GNU General Pub +lic * License or the Artistic License, as specified in the README file +. * * Options: * * Defining _USE_MSVCRT_MEM_ALLOC will cause all memory allocations * to be forwarded to MSVCRT.DLL. Defining _USE_LINKED_LIST as well wi +ll * track all allocations in a doubly linked list, so that the host can * free all memory allocated when it goes away. * If _USE_MSVCRT_MEM_ALLOC is not defined then Knuth's boundary tag a +lgorithm * is used; defining _USE_BUDDY_BLOCKS will use Knuth's algorithm R * (Buddy system reservation)
      I don't know how ActiveState perl was built. It's probably in Config.pm somewhere, though. The comment about _USE_LINKED_LIST leads me to believe that there is a memory leak, which can be fixed at the expense of using more memory. My guess, based on your observations, is that ActiveState perl has this leak. Also, you might possibly have a version of MSVCRT.DLL installed which is not the one that ActiveState had in mind for you to use.

      It should work perfectly the first time! - toma

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://344162]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others pondering the Monastery: (2)
As of 2014-07-26 00:03 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    My favorite superfluous repetitious redundant duplicative phrase is:









    Results (175 votes), past polls