Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl Monk, Perl Meditation

Re: Re: Re: Re: Optimising processing for large data files.

by BrowserUk (Pope)
on Apr 10, 2004 at 23:09 UTC ( #344172=note: print w/replies, xml ) Need Help??

in reply to Re: Re: Re: Optimising processing for large data files.
in thread Optimising processing for large data files.

Okay. If anyone with perl v5.8.2 (AS 808) running under XP (or similar configuration) is following this discussion, could they please run the following code under these conditions.

  1. Download the code below and save as "".
  2. Create a datafile of say 30MB. It doesn;t matter what it contains.
  3. Start the task manager and configure it with:
    1. Click the "Performance" tab and note how much Available Physical Memory your system has.
    2. Click the "Processes" tab.
    3. View->select columns...

      Ensure that "Memory usage", "Memory Usage Delta" & "Virtual Memory Size" columns are all checked.

    4. Ensure that all 3 columns are visible (preferably next to each other by temporarially unchecking any intermediate ones.
    5. Check View->Update speed->High.
    6. Check Options->Always on top.
    7. Adjust the task manager window to a convenient size and position so that you can monitor it whilst running the code.
    8. Click the "CPU" column header a couple of times to ensure that the display is sorted by cpu usage in descending order.
  4. Switch to a command line and run the program.

    buk datafile

Watch the 3 memory columns for perl.exe (should become the top item if you followed the above directins and don't have any other cpu intensive processes running) as the program runs.

Watch carefully, and note how the "Mem Usage" figure steadily rises for a short period before suddenly dropping back.

The "Mem Delta figure will become negative (the value displayed in braces) each time the "Mem usage" figure falls back.

Note that the "VM Size" value tracks the "Mem Usage" closely whilst being slightly larger, and grows steadily for a short period before falling back in step with "Mem Usage".

Note that each time it falls back it doesn't fall as far as it grew, resulting in an overall steady increase in the memory usage.

Note that the frequency and size of the fallbacks seems to grow ever larger, and more frequent with time.

Once you have seen enough, ^C the program.

Don't allow the "Mem Usage" value to approach the "Physical Memory Available" figure as by then you will have moved into swapping and the picture becomes confused as the OS starts swapping memory from other processors to disk and all the Mem Delta figures start showing up (negative)decreases.

I'd be really grateful if at least one other person could confirm that they too see the behaviour described.

#! perl -slw use strict; my @cache; open( FH, '< :raw', $ARGV[ 0 ]) or die $!; while( <FH> ) { push @cache, split '', $_; my $pair = shift( @cache ) . $cache[ 499 ] for 0 .. $#cache - 500; } close FH;

Assuming that this behaviour isn't a figment of my imagination and is confirmed by other(s), then if anyone has a better explaination of the (temporary, but often substantial) reductions in perl.exe's memory usage, other than Perl periodically freeing heap memory back to the OS as part of some "garbage collection like" process, I'm ready to eat my hat and apologise for misleading the monks.

Examine what is said, not who speaks.
"Efficiency is intelligent laziness." -David Dunham
"Think for yourself!" - Abigail

Replies are listed 'Best First'.
Re: Re: Re: Re: Re: Optimising processing for large data files.
by tilly (Archbishop) on Apr 11, 2004 at 01:54 UTC
    I guarantee the following statements.
    1. With Perl 5.8.2 running on Linux 2.4, I saw no such behaviour.
    2. Perl's documentation is readily available, and clearly documents that Perl does reference counting.
    3. Perl has reliable destructor behaviour, which no efficient true GC algorithm that I have seen offers.
    4. Parrot does use true GC, and there have been more than a few debates about what to do about reliable destructor behaviour with that switch.
    5. Perl's source-code is readily available, the SV struct has a field known as sv_refcnt, with macros like SvREFCNT_inc and SvREFCNT_dec to manipulate it. This is the reference counting mechanism.
    6. Perl has support to actually try to return unused memory to the OS where the OS supports that. I don't know the details of that support, but I've seen chip point it out, and I trust that he knows what he is talking about.
    7. I do not know how reliable Windows profiling tools are, or how the above-mentioned support for returning memory to the OS actually works. It is not impossible that what you describe has something to do with how Windows handles that interaction, or with how the profiling tools track stuff. I venture no guesses on how likely that is.
    As a result of them I maintain my stated position. For good or bad, Perl 5 does NOT have a garbage collector that stops the program in the middle of running and proceeds to detect garbage.

      There you go again. Take two of my words, add 2 dozen of your own, mix together into an statement I never made--and then attack.

      You didn't see it. The rest is totally irrelavent.

      Examine what is said, not who speaks.
      "Efficiency is intelligent laziness." -David Dunham
      "Think for yourself!" - Abigail
        I disagree on the irrelevance of what I said.

        I'm gave you you a data point. On a different OS. I further addressed the question of how I know that Perl does not run GC periodically, and giving information that you can verify about it. My information varied from incidental discussions that you can Google for to see knowledgeable people talking about it to what to look for in the source code. In the process I hopefully showed that we have a better better way to resolve the question than running an experiment and guessing the meaning. I also provided an alternate explanation for the behaviour that you reported.

        Which piece of information is irrelevant to the discussion, and why do you think that it is irrelevant?

Re: Re: Re: Re: Re: Optimising processing for large data files.
by toma (Vicar) on Apr 11, 2004 at 19:55 UTC
    It would be interesting to try a few variations on this code, such as not using lexical variables, to isolate the cause of this behavior. Another source of memory strangeness could be the file system.

    If you look in the win32 directory of the Perl source code, you will see that there are several ways to change the way that perl allocates and frees memory, in 5.8.0, at least. Here's a snippet from vmem.h:

    /* vmem.h * * (c) 1999 Microsoft Corporation. All rights reserved. * Portions (c) 1999 ActiveState Tool Corp, +/ * * You may distribute under the terms of either the GNU General Pub +lic * License or the Artistic License, as specified in the README file +. * * Options: * * Defining _USE_MSVCRT_MEM_ALLOC will cause all memory allocations * to be forwarded to MSVCRT.DLL. Defining _USE_LINKED_LIST as well wi +ll * track all allocations in a doubly linked list, so that the host can * free all memory allocated when it goes away. * If _USE_MSVCRT_MEM_ALLOC is not defined then Knuth's boundary tag a +lgorithm * is used; defining _USE_BUDDY_BLOCKS will use Knuth's algorithm R * (Buddy system reservation)
    I don't know how ActiveState perl was built. It's probably in somewhere, though. The comment about _USE_LINKED_LIST leads me to believe that there is a memory leak, which can be fixed at the expense of using more memory. My guess, based on your observations, is that ActiveState perl has this leak. Also, you might possibly have a version of MSVCRT.DLL installed which is not the one that ActiveState had in mind for you to use.

    It should work perfectly the first time! - toma

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://344172]
LanX already said this, days ago!
[karlgoethebier]: LanX: Never Ending Tour?

How do I use this? | Other CB clients
Other Users?
Others chilling in the Monastery: (7)
As of 2017-05-25 13:02 GMT
Find Nodes?
    Voting Booth?