http://www.perlmonks.org?node_id=827922

BrowserUk has asked for the wisdom of the Perl Monks concerning the following question:

If I run this:

perl -MTime::HiRes=time -wE"my $t=time; my @a=1..1e6;say time-$t; <>" 0.0623979568481445

It takes 6/100ths sec; consumes 84MB and causes 24,000 page faults.

This:

perl -MTime::HiRes=time -wE"my $t=time; my @a=map $_,1..1e6;say time-$ +t; <>" 0.216002941131592

2/10ths ec; 108MB; 30,000 page faults.

But this:

perl -MTime::HiRes=time -wE"my $t=time; my @a=map $_,map $_,1..1e6;say + time-$t; <>" 31.4807999134064

32 seconds; 140MB; 5.7 million page faults?

But most confusingly:

perl -MTime::HiRes=time -wE"my $t=time; my @a=map $_,map $_,1..5e5;say + time-$t; <>" 7.87800002098084

8 seconds; 72MB; 1.5 million page faults.

Any thoughts where to start looking?

This is Perl 5.10.1 (AS1007) 64-bit on windows. Other platforms are almost certainly not affected.


Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.

Replies are listed 'Best First'.
Re: What could cause excessive page faults? (A fix)
by BrowserUk (Patriarch) on Mar 11, 2010 at 12:10 UTC

    To whom it may concern.

    Commenting out line 25 of win32\VMem.h fixes the problem of wildly excessive page faults, that are causing a quadratic slowdown on memory allocations under some circumstances.

    /* vmem.h * * (c) 1999 Microsoft Corporation. All rights reserved. * Portions (c) 1999 ActiveState Tool Corp, http://www.ActiveState.com +/ * * You may distribute under the terms of either the GNU General Pub +lic * License or the Artistic License, as specified in the README file +. * * Options: * * Defining _USE_MSVCRT_MEM_ALLOC will cause all memory allocations * to be forwarded to MSVCRT.DLL. Defining _USE_LINKED_LIST as well wi +ll * track all allocations in a doubly linked list, so that the host can * free all memory allocated when it goes away. * If _USE_MSVCRT_MEM_ALLOC is not defined then Knuth's boundary tag a +lgorithm * is used; defining _USE_BUDDY_BLOCKS will use Knuth's algorithm R * (Buddy system reservation) * */ #ifndef ___VMEM_H_INC___ #define ___VMEM_H_INC___ #ifndef UNDER_CE //#define _USE_MSVCRT_MEM_ALLOC // <<<<<<<<<< HERE #endif

    With this fix, the OP snippet that takes 32+ seconds to run, now takes just 0.4 seconds:

    Update: Should'a mentioned 37,000 page faults instead of 5,7 million. Memory consumption the same in both cases.

    .\perl -MTime::HiRes=time -wE"my $t=time; my @a=map $_,map $_,1..1e6;s +ay time-$t;<>" 0.392155885696411

    I've tried to reason about the possible consequences of this change, but get lost in the layers upon layers of conditional redefinition, redirection and misdirection in the perl sources. It doesn't appear to cause any additional test suite failures, but then I seriously doubt if the appropriate circumstances are being tested anywhere.


    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.

      Somewhat related (or not) to using a different memory allocator than MSVCRT, Patch to make string-append on Win32 100 times faster. This change makes Perl grow/realloc strings exponentially geometrically instead of by a fixed amount and hence showed a speedup at least for Windows, by avoiding calls to realloc(). On Linux, at least with certain allocators, a slowdown was found. For BSD, Perl already uses its own allocator.

        Many thanks for that. It looks like I'll be upgrading from 5.10 to 5.14 when it comes, in the hope that this patch addresses the OP problem.

        Though I do wonder if my one-line patch wouldn't have fixed that too?


        Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
        "Science is about questioning the status quo. Questioning authority".
        In the absence of evidence, opinion is indistinguishable from prejudice.
Re: What could cause excessive page faults?
by GrandFather (Saint) on Mar 11, 2010 at 08:26 UTC

    From outside the box it looks like Perl is creating an intermediate list between the two maps and somewhere before 5e5 elements the system starts swapping. Before the swapping starts the page faults happen when more memory is needed to extend the lists giving about 3500 bytes per fault - close enough to a 4K page size perhaps. At some point the lists get too big to stay resident and the system starts swapping with the resultant increase in page faults (now against non-resident pages) and consequent increase in time (due to the page fetches from disk).

    What seems very odd is that it is happening on a 64 bit system with (I presume) plenty of ram! However I get very similar results with a 32 bit Windows system and 5.10.1 btw.


    True laziness is hard work

      There is no swapping involved. I have 4GB of ram and well over 2 of that was available at the point the scripts achieved maximum usage.

      Page faults can also occur due to Windows 2-stage virtual memory allocation schema. Virtual memory can be 'reserved' by a process, without being 'commited'. Reservation means that space is reserved within the process address space and page tables within the OS internal structures, but no actual physical memory is yet assigned to back that reservation up. When access is first attempted to a 'reserved' page of virtual memory, a page fault occurs. and the page (or many) must be 'commited' before the memory access completes.

      The way this normally works is, in the executable header, there are 2 values that are used to reserve stack and heap spaces for the process when it it loaded. There are 2 other values which define how much of the reservation gets committed each time a reserved page pagefault occurs:

      C:\test>dumpbin /headers \perl64\bin\perl.exe Microsoft (R) COFF/PE Dumper Version 9.00.21022.08 Copyright (C) Microsoft Corporation. All rights reserved. Dump of file \perl64\bin\perl.exe PE signature found File Type: EXECUTABLE IMAGE FILE HEADER VALUES 8664 machine (x64) 5 number of sections 4B60BA96 time date stamp Wed Jan 27 22:13:42 2010 0 file pointer to symbol table 0 number of symbols F0 size of optional header 23 characteristics Relocations stripped Executable Application can handle large (>2GB) addresses OPTIONAL HEADER VALUES ... 1000000 size of stack reserve 100000 size of stack commit 1000000 size of heap reserve 100000 size of heap commit ...

      Note. The above values are non-standard as I have changed them in an attempt to track this down. The problem is, the above settings have no affect upon the outcome.

      I think I've tracked the problem to <perlsources>\win32\vmem.h. Specifically,

      # line 134 VMem::VMem() { m_lRefCount = 1; InitializeCriticalSection(&m_cs); #ifdef _USE_LINKED_LIST m_Dummy.pNext = m_Dummy.pPrev = &m_Dummy; m_Dummy.owner = this; #endif m_hLib = LoadLibrary("msvcrt.dll"); if (m_hLib) { m_pfree = (LPFREE)GetProcAddress(m_hLib, "free"); m_pmalloc = (LPMALLOC)GetProcAddress(m_hLib, "malloc"); m_prealloc = (LPREALLOC)GetProcAddress(m_hLib, "realloc"); } }

      And I think this in the makefile is a contributary factor:

      LIBC = msvcrt.lib

      No proof yet. (I'm on my 3rd perl build; and geez they take a long time!) Just gut feel at this point. What I can say is that the problem still manifests itself when Perl and all its libraries are built with the same compiler (use the same CRT). And that using PERL_MALLOC doesn't change the situation.

      However I get very similar results with a 32 bit Windows system and 5.10.1 btw.

      Thanks for that. I don't suppose you have a AS 5.8.something install kicking around that you could try this on?


      Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
      "Science is about questioning the status quo. Questioning authority".
      In the absence of evidence, opinion is indistinguishable from prejudice.
Re: What could cause excessive page faults?
by Marshall (Canon) on Mar 11, 2010 at 03:11 UTC
    I find this syntax for map confusing and always use the curly braces syntax. map{...}

    perl -MTime::HiRes=time -wE"my $t=time; my @a=1..1e6;say time-$t;
    is way faster than:
    perl -MTime::HiRes=time -wE"my $t=time; my @a=map $_,1..1e6;say time-$ ++t;
    because the map generates an anon array that is an "extra step" and that array is copied to @a.

    In your later code, you have a map within a map which is similar to a foreach within a foreach. So it is going to run like 1 million times slower.

    On my machine...

    C:\Projects>perl -MTime::HiRes=time -wE"my $t=time; my @a=(1..1e6);say + time-$t;" 0.0974979400634766 C:\Projects>perl -MTime::HiRes=time -wE"my $t=time; my @a=map $_,1..1e +6;say time-$t;" 0.337559938430786 C:\Projects>perl -MTime::HiRes=time -wE"my $t=time; my @a=map {$_}1..1 +e6;say time-$t;" 0.339046001434326

      because the map generates an anon array that is an "extra step" and that array is copied to @a.

      No array is created. One million scalars are created, but the question wasn't about that snippet. It was provided as a baseline.

      In your later code, you have a map within a map which is similar to a foreach within a foreach. So it is going to run like 1 million times slower.

      No, it's not multiplicative like a foreach in a foreach. It's additive like a foreach after a foreach.

      my @array = map A, map B, LIST;

      is functionally similar to

      my @list1; for (LIST) { push @list1, B; } my @list2; for (@list1) { push @list2, A; } my @array = @list2;

      Snippet three should take about 0.21 + (0.21-0.06) = 0.56, but it's taking 31.48 due to excessive paging. Why is it paging so much?

        on my machine for this, I get:
        C:\Projects>perl -MTime::HiRes=time -wE"my $t=time; my @a=map{$_}map{$ +_}1..1e6;say time-$t;" 12.703125
        which I think is similar to: my @array = map A, map B, LIST; I am actually surprised that on a large 64 bit machine running some kind of *nix, that there are any page faults at all. I mean why does the "simple" version page fault? Sorry, I don't know.

        Update: oh I see that this is Windows, I presume Win 7 instead of Vista? There are a bunch of versions of this OS, that might matter.

        more tests on my 32 bit Win XP Pro machine 2 GB memory, AS 5.10.1:

        C:\Projects>perl -MTime::HiRes=time -wE"my $t=time; my @a=map{$_}map{ +$_}1..100e3;say time-$t;" 0.193822860717773 C:\Projects>perl -MTime::HiRes=time -wE"my $t=time; my @a=map{$_}map{ +$_}1..200e3;say time-$t;" 0.609375 C:\Projects>perl -MTime::HiRes=time -wE"my $t=time; my @a=map{$_}map{ +$_}1..300e3;say time-$t;" 1.28125 C:\Projects>perl -MTime::HiRes=time -wE"my $t=time; my @a=map{$_}map{ +$_}1..100e3;say time-$t;" 0.18930196762085 C:\Projects>perl -MTime::HiRes=time -wE"my $t=time; my @a=map{$_}map{ +$_}1..250e3;say time-$t;" 0.921875 C:\Projects>perl -MTime::HiRes=time -wE"my $t=time; my @a=map{$_}map{ +$_}1..500e3;say time-$t;" 3.375 C:\Projects>perl -MTime::HiRes=time -wE"my $t=time; my @a=map{$_}map{ +$_}1..1000e3;say time-$t;" 13.15625 C:\Projects>perl -MTime::HiRes=time -wE"my $t=time; my @a=map{$_}map{ +$_}1..2000e3;say time-$t;" 50.140625

        Another "benchmark update":
        C:\Projects>perl -MTime::HiRes=time -wE"my $t=time; my @a=1..16000e3;s +ay time-$t;" 1.5 C:\Projects>perl -MTime::HiRes=time -wE"my $t=time; my @a=map{$_}1..16 +000e3;say time-$t;" 5.546875 C:\Projects>perl -MTime::HiRes=time -wE"my $t=time; my @a=map{$_}map{$ +_}1..16000e3;say time-$t;" 3173.625
        The maps above essentially don't do anything useful at all, but this shows the exponential increase in execution time on my 32 bit Win machine. So, I don't think this is specific to 64 bit machines. Apparently map uses more memory than one would think for a "do nothing" operation and also that Perl winds up accessing this extra memory in a way that causes a lot of page faults which would indicate that Perl is not cycling through sequential memory locations. Why that is and how that works, I don't know yet. But at least I can say this happens on 32 bit machines also.