Beefy Boxes and Bandwidth Generously Provided by pair Networks
Your skill will accomplish
what the force of many cannot
 
PerlMonks  

Why is Windows 100 times slower than Linux when growing a large scalar?

by Anonymous Monk
on Nov 30, 2009 at 22:44 UTC ( #810276=perlquestion: print w/ replies, xml ) Need Help??
Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Monks,

I have tracked down a performance issue in my application that can be highlighted with the example below. Although the example is trival I want to understand why Windows (Vista) is so much slower than Linux (typically 70-100 times slower depending on the version of Perl) when growing a scalar? I have tested this under Perl 5.8.x and Perl 5.10.x. Could this be a bug?

The original thread for this question is here http://www.perlmonks.org/?node_id=810049

Regards, Red.

use strict; use warnings; my $iter = 1000000; #number of items my $string =''; #our string my $teststring='abcdefghijklmnopqrstuvwxyz1234567890'; #what to grow t +he string #Comment out this line to speed things up under windows. #$string = 1 x ($iter* length($teststring)); Time(); $string=''; for (1..$iter) { $string.=$teststring; } Time(); print "Finished\n"; sleep(2000); sub Time { my ($user,$system,$cuser,$csystem) = times; print "$user,$system\n"; }

Comment on Why is Windows 100 times slower than Linux when growing a large scalar?
Download Code
Re: Why is Windows 100 times slower than Linux when growing a large scalar?
by explorer (Chaplain) on Nov 30, 2009 at 22:51 UTC

    Other example:

    for ( 1 .. 100_000 ) { $x .= ('x' x 1_000); }
    Linux: 0.4 seconds. Windows: 20 minutes.

    Updated: this code was discovered at August 2007.

Re: Why is Windows 100 times slower than Linux when growing a large scalar?
by Burak (Chaplain) on Nov 30, 2009 at 23:43 UTC
    I can confirm this with ActivePerl 5.10.1.1006 on Vista 32bit. There seems to be a problem with concatenation somehow. However "x" operator seems to be fast:
    $string.= $teststring x $iter;
Re: Why is Windows 100 times slower than Linux when growing a large scalar?
by Anonymous Monk on Dec 01, 2009 at 00:02 UTC

    More anecdotal data, with a one-liner test:


    Activestate Perl 5.10.0 on XP+SP2:

    >perl -v This is perl, v5.10.0 built for MSWin32-x86-multi-thread [snip] Binary build 1004 [287188] provided by ActiveState http://www.ActiveSt +ate.com Built Sep 3 2008 13:16:37 [snip] >perl -MBenchmark -lwe "my$x=q//; print timestr(timeit(eval($ARGV[0]), +sub{$x .= (q/x/ x 1000);}));" 10_000 64 wallclock secs (37.05 usr + 25.88 sys = 62.92 CPU) @ 158.93/s (n=10 +000)
    (100,000 was obviously taking forever, so I skipped it.)


    Cygwin Perl 5.10.0 on the same system:

    # perl -v This is perl, v5.10.0 built for cygwin-thread-multi-64int [snip] # perl -MBenchmark -lwe 'my$x=q//; print timestr(timeit(eval($ARGV[0]) +,sub{$x .= (q/x/ x 1000);}));' 10_000 1 wallclock secs ( 0.06 usr + 0.00 sys = 0.06 CPU) @ 158730.16/s (n +=10000) # perl -MBenchmark -lwe 'my$x=q//; print timestr(timeit(eval($ARGV[0] +),sub{$x .= (q/x/ x 1000);}));' 100_000 1 wallclock secs ( 0.61 usr + 0.03 sys = 0.64 CPU) @ 156006.24/s (n +=100000)

      Also XP, MinGW built
      This is perl, v5.10.1 (perl-5.10.1*) built for MSWin32-x86-multi-threa +d
      $ perl -MBenchmark -lwe "my$x=q//; print timestr(timeit(eval($ARGV[0]) +,sub{$x .= (q/x/ x 1000);}));" 10_000 32 wallclock secs (20.41 usr + 9.34 sys = 29.75 CPU) @ 336.13/s (n=10 +000)
      ActivePerl
      >perl -v This is perl, v5.8.9 built for MSWin32-x86-multi-thread (with 9 regist +ered patches, see perl -V for more detail) >perl -MBenchmark -lwe "my$x=q//; print timestr(timeit(eval($ARGV[0]), +sub{$x .= (q/x/ x 100);}));" 10_000 33 wallclock secs (20.53 usr + 9.47 sys = 30.00 CPU) @ 333.32/s (n=10 +000)
Re: Why is Windows 100 times slower than Linux when growing a large scalar?
by ikegami (Pope) on Dec 01, 2009 at 00:50 UTC
    Perl has more than one allocation scheme. One of them appears to be horrible on Windows. Given that I don't know
    • The options available.
    • The options available on Windows.
    • The plusses and minusses of each.

    it could be any of the following:

    • Bad build settings due to operator error.
    • Bad build settings due to unclear descriptions. (Bug)
    • Bad build settings due to bad defaults. (Bug)
    • Performance issues for the allocators available on Windows. (Bug)
      Perl has more than one allocation scheme

      I've just built perl (with mingw) using perl's malloc and that takes care of the problem.
      However, according to comments in the makefile.mk, if you use perl's malloc you have to build without USE_IMP_SYS (which I also did). This means that the perl that has been built with perl's malloc has no threads or fork emulation - which would be unsatisfactory for many people. It also means that the ppm packages available from the various repos are unusable with this build of perl.

      It was perl-5.11.2 that I built to check this out, having first established that perl-5.11.2 exhibits the crap behaviour when built with "normal" options (and it does).

      Cheers,
      Rob

        The problem seems to be pretty definitely sourced within the MS CRT.

        If I compile and run this using gcc under Ubuntu running inside a VirtualBox emulator (which ought to carry some overheads):

        #include <stdio.h> #include <stdlib.h> #include <time.h> int main( int argc, char** argv ) { long long i, n = 1000000; char *p = (char*)malloc( 1000 ); time_t start, finish; double elapsed; time( &start ); for( i = 2; i < n; ++i ) { // printf( "\r %lld\t", i * 1000 ); if( ! ( p = (char*)realloc( p, 1000 * i ) ) ) { printf( "\nFailed to realloc to %lld\n", i * 1000 ); exit( 1 ); } } time( &finish ); elapsed = difftime( finish, start ); printf( "\nfinal size: %lld; took %.3f seconds\n", n * 1000, elaps +ed ); exit( 0 ); }

        It takes less that a second to realloc a buffer to 1GB in 1000 byte increments:

        mehere@mehere-desktop:~$ gcc mem.c -o memtest mehere@mehere-desktop:~$ ./memtest final size: 1000000000; took 0.000 seconds

        However, if I compile and run this using MS VC++:

        #include <stdio.h> #include <time.h> int main( int argc, char** argv ) { __int64 i, n = 1000000; char *p = (char*)malloc( 1000 ); time_t start, finish; double elapsed; time( &start ); for( i = 2; i < n; ++i ) { // printf( "\r %I64d\t", i * 1000 ); if( ! ( p = (char*)realloc( p, i * 1000 ) ) ) { printf( "\nFailed to realloc to %I64d\n", i * 1000 ); exit( 1 ); } } time( &finish ); elapsed = difftime( finish, start ); printf( "\nfinal size: %I64d; took %.3f seconds\n", n * 1000, elap +sed ); exit( 0 ); }

        it takes over an hour to run. I haven't had the patience to let it complete yet!

        I also compiled it with MinGW (which appears to also use the MSCRT?), and it has taken 1hr 20 mins (so far and appears to only 1/4 of the way there).

        The problem seems to lie with the CRT realloc() which grows the heap in iddy-biddy chunks each time.

        In addition, it may be

        • walking the heap attempting to coallesce freespace prior to allocating extra virtual memory.
        • The virtual memory allocator zeros new commits prior to copying over teh existing data into the reallocated space.

        Bottom line: The MSCRT heap management routines are crap!


        Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
        "Science is about questioning the status quo. Questioning authority".
        In the absence of evidence, opinion is indistinguishable from prejudice.
Re: Why is Windows 100 times slower than Linux when growing a large scalar?
by BrowserUk (Pope) on Dec 01, 2009 at 10:45 UTC

    Compiling perl 5.10.1 without USE_IMP_SYS and with USE_PERL_MALLOC makes a huge difference. The following script grows two strings, each in a separate thread, to 1/2GB in 1000-byte increments in a little under 2 1/2 seconds:

    #! perl -slw use strict; use Time::HiRes qw[ time ]; use threads; <>; my $start = time; async { my $s = chr(0) x 1000; for( 1 .. 5e5 ) { $s .= chr( 0 ) x 1000; } }->detach; my $s = chr(0) x 1000; for( 1 .. 5e5 ) { $s .= chr( 0 ) x 1000; } printf "Re-allocated 2 x 500MB (in 1000 byte incements) on two threads + in %.3f\n", time() - $start; <>; __END__ C:\perl\5.10.1\bin>.\perl.exe \test\mem.pl Re-allocated 2 x 500MB (in 1000 byte incements) on two threads in 2.41 +6

    Now the question is: why is USE_IMP_SYS required for fork emulation? And can that be corrected?

    Cluebats welcomed. Along with any thoughts on more thorough testing of Perl_malloc and threading.


    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.

      Did you use Mingw or VC to compile Perl? Did you happen to run the performance test suite to see if there were any generic performance improvements with this perl?

      I too could lose fork emulation, as long as threads work:)

      Red.

        I used Microsoft (R) C/C++ Optimizing Compiler Version 15.00.21022.08 for x64. And Syphilis used MinGW.

        There's a Perl performance test suite? :)


        Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
        "Science is about questioning the status quo. Questioning authority".
        In the absence of evidence, opinion is indistinguishable from prejudice.

        No really. Could you point me to the "performance test suite" please? My searches haven't located anything likely.


        Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
        "Science is about questioning the status quo. Questioning authority".
        In the absence of evidence, opinion is indistinguishable from prejudice.
Re: Why is Windows 100 times slower than Linux when growing a large scalar?
by cdarke (Prior) on Dec 01, 2009 at 11:01 UTC
    Looking at dependency walker, ActiveState 5.10.1 has a Perl510.dll with entry points win32_malloc, win32_calloc and win32_realloc. In the source code they just call the CRT malloc/calloc/realloc, they don't do any magic with Win32 Heap APIs.

    I noticed that the code is all compiled in Debug. A feature of Windows is that a process can have custom heaps, and MSCRT uses a different heap for malloc/calloc/realloc whilst in Debug. For example it adds sanity markers between each allocated block, keeps track of each allocation, and so on. gcc can do a similar thing but requires environment variables to be set.

    Whether Debug would have such a drastic effect on performance I cannot say, but its a good place to start.

    Update: For details of the debug overhead, see http://msdn.microsoft.com/en-us/library/bebs9zyz(VS.80).aspx.
Re: Why is Windows 100 times slower than Linux when growing a large scalar?
by tallulah (Novice) on Dec 01, 2009 at 13:14 UTC
    as a consequence of this report i am downloading now ubunto linux, i will install it on the second primary partition. this will be my first experience with linux
    i hope the installation will not corrupt my first primary partition.

      Ubunto 9.10 really is quite usable. It's the first linux dist I've tried that I could say that about--and I've tried quite a few.

      Now, if only Linus would relent and allow pluggable schedulers...


      Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
      "Science is about questioning the status quo. Questioning authority".
      In the absence of evidence, opinion is indistinguishable from prejudice.
      If you're seriously worried about it eating all of the cheese in the house, just have a play with the Live CD, or slap it inside a VirtualBox (or whatever) VM.

      davis

      You can install Ubuntu into a file on the Windows filesytem (See Wubi). No need to create an extra partition.

      This will also not overwrite the Windows-Bootloader (as GRUB et al. do) but use the Windows-Bootloader to start Ubuntu.

      Btw, does anyone know what happened to the "Linux In a Window" (under Windows) that i recall from the day of Win98?


      holli

      You can lead your users to water, but alas, you cannot drown them.

        There is CoLinux, which is a port of the Linux Kernel (and some userspace) to Windows.

        Btw, does anyone know what happened to the "Linux In a Window"

        I've never heard of that, but VirtualBox runs Ubunto in a seemless window on a win32 or win64 desktop, which is very convenient (and free).


        Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
        "Science is about questioning the status quo. Questioning authority".
        In the absence of evidence, opinion is indistinguishable from prejudice.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://810276]
Approved by AnomalousMonk
Front-paged by almut
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others wandering the Monastery: (14)
As of 2014-08-27 13:24 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    The best computer themed movie is:











    Results (238 votes), past polls