Re: Why is Windows 100 times slower than Linux when growing a large scalar?
by ikegami (Patriarch) on Dec 01, 2009 at 00:50 UTC
|
| [reply] [Watch: Dir/Any] |
|
Perl has more than one allocation scheme
I've just built perl (with mingw) using perl's malloc and that takes care of the problem. However, according to comments in the makefile.mk, if you use perl's malloc you have to build without USE_IMP_SYS (which I also did). This means that the perl that has been built with perl's malloc has no threads or fork emulation - which would be unsatisfactory for many people. It also means that the ppm packages available from the various repos are unusable with this build of perl.
It was perl-5.11.2 that I built to check this out, having first established that perl-5.11.2 exhibits the crap behaviour when built with "normal" options (and it does).
Cheers, Rob
| [reply] [Watch: Dir/Any] |
|
#include <stdio.h>
#include <stdlib.h>
#include <time.h>
int main( int argc, char** argv ) {
long long i, n = 1000000;
char *p = (char*)malloc( 1000 );
time_t start, finish;
double elapsed;
time( &start );
for( i = 2; i < n; ++i ) {
// printf( "\r %lld\t", i * 1000 );
if( ! ( p = (char*)realloc( p, 1000 * i ) ) ) {
printf( "\nFailed to realloc to %lld\n", i * 1000 );
exit( 1 );
}
}
time( &finish );
elapsed = difftime( finish, start );
printf( "\nfinal size: %lld; took %.3f seconds\n", n * 1000, elaps
+ed );
exit( 0 );
}
It takes less that a second to realloc a buffer to 1GB in 1000 byte increments:
mehere@mehere-desktop:~$ gcc mem.c -o memtest
mehere@mehere-desktop:~$ ./memtest
final size: 1000000000; took 0.000 seconds
However, if I compile and run this using MS VC++:
#include <stdio.h>
#include <time.h>
int main( int argc, char** argv ) {
__int64 i, n = 1000000;
char *p = (char*)malloc( 1000 );
time_t start, finish;
double elapsed;
time( &start );
for( i = 2; i < n; ++i ) {
// printf( "\r %I64d\t", i * 1000 );
if( ! ( p = (char*)realloc( p, i * 1000 ) ) ) {
printf( "\nFailed to realloc to %I64d\n", i * 1000 );
exit( 1 );
}
}
time( &finish );
elapsed = difftime( finish, start );
printf( "\nfinal size: %I64d; took %.3f seconds\n", n * 1000, elap
+sed );
exit( 0 );
}
it takes over an hour to run. I haven't had the patience to let it complete yet!
I also compiled it with MinGW (which appears to also use the MSCRT?), and it has taken 1hr 20 mins (so far and appears to only 1/4 of the way there).
The problem seems to lie with the CRT realloc() which grows the heap in iddy-biddy chunks each time.
In addition, it may be
- walking the heap attempting to coallesce freespace prior to allocating extra virtual memory.
- The virtual memory allocator zeros new commits prior to copying over teh existing data into the reallocated space.
Bottom line: The MSCRT heap management routines are crap!
Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.
| [reply] [Watch: Dir/Any] [d/l] [select] |
|
|
|
Re: Why is Windows 100 times slower than Linux when growing a large scalar?
by cdarke (Prior) on Dec 01, 2009 at 11:01 UTC
|
Looking at dependency walker, ActiveState 5.10.1 has a Perl510.dll with entry points win32_malloc, win32_calloc and win32_realloc. In the source code they just call the CRT malloc/calloc/realloc, they don't do any magic with Win32 Heap APIs.
I noticed that the code is all compiled in Debug. A feature of Windows is that a process can have custom heaps, and MSCRT uses a different heap for malloc/calloc/realloc whilst in Debug. For example it adds sanity markers between each allocated block, keeps track of each allocation, and so on. gcc can do a similar thing but requires environment variables to be set.
Whether Debug would have such a drastic effect on performance I cannot say, but its a good place to start.
Update: For details of the debug overhead, see http://msdn.microsoft.com/en-us/library/bebs9zyz(VS.80).aspx. | [reply] [Watch: Dir/Any] |
Re: Why is Windows 100 times slower than Linux when growing a large scalar?
by BrowserUk (Patriarch) on Dec 01, 2009 at 10:45 UTC
|
Compiling perl 5.10.1 without USE_IMP_SYS and with USE_PERL_MALLOC makes a huge difference. The following script grows two strings, each in a separate thread, to 1/2GB in 1000-byte increments in a little under 2 1/2 seconds:
#! perl -slw
use strict;
use Time::HiRes qw[ time ];
use threads;
<>;
my $start = time;
async {
my $s = chr(0) x 1000;
for( 1 .. 5e5 ) {
$s .= chr( 0 ) x 1000;
}
}->detach;
my $s = chr(0) x 1000;
for( 1 .. 5e5 ) {
$s .= chr( 0 ) x 1000;
}
printf "Re-allocated 2 x 500MB (in 1000 byte incements) on two threads
+ in %.3f\n",
time() - $start;
<>;
__END__
C:\perl\5.10.1\bin>.\perl.exe \test\mem.pl
Re-allocated 2 x 500MB (in 1000 byte incements) on two threads in 2.41
+6
Now the question is: why is USE_IMP_SYS required for fork emulation? And can that be corrected?
Cluebats welcomed. Along with any thoughts on more thorough testing of Perl_malloc and threading.
Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.
| [reply] [Watch: Dir/Any] [d/l] |
|
Did you use Mingw or VC to compile Perl? Did you happen to run the performance test suite to see if there were any generic performance improvements with this perl?
I too could lose fork emulation, as long as threads work:)
Red.
| [reply] [Watch: Dir/Any] |
|
| [reply] [Watch: Dir/Any] [d/l] |
|
| [reply] [Watch: Dir/Any] |
|
Re: Why is Windows 100 times slower than Linux when growing a large scalar?
by explorer (Chaplain) on Nov 30, 2009 at 22:51 UTC
|
for ( 1 .. 100_000 ) {
$x .= ('x' x 1_000);
}
Linux: 0.4 seconds. Windows: 20 minutes.
Updated: this code was discovered at August 2007. | [reply] [Watch: Dir/Any] [d/l] |
Re: Why is Windows 100 times slower than Linux when growing a large scalar?
by Burak (Chaplain) on Nov 30, 2009 at 23:43 UTC
|
I can confirm this with ActivePerl 5.10.1.1006 on Vista 32bit. There seems to be a problem with concatenation somehow. However "x" operator seems to be fast:
$string.= $teststring x $iter;
| [reply] [Watch: Dir/Any] [d/l] |
Re: Why is Windows 100 times slower than Linux when growing a large scalar?
by Anonymous Monk on Dec 01, 2009 at 00:02 UTC
|
More anecdotal data, with a one-liner test:
Activestate Perl 5.10.0 on XP+SP2:
>perl -v
This is perl, v5.10.0 built for MSWin32-x86-multi-thread
[snip]
Binary build 1004 [287188] provided by ActiveState http://www.ActiveSt
+ate.com
Built Sep 3 2008 13:16:37
[snip]
>perl -MBenchmark -lwe "my$x=q//; print timestr(timeit(eval($ARGV[0]),
+sub{$x .= (q/x/ x 1000);}));" 10_000
64 wallclock secs (37.05 usr + 25.88 sys = 62.92 CPU) @ 158.93/s (n=10
+000)
(100,000 was obviously taking forever, so I skipped it.)
Cygwin Perl 5.10.0 on the same system: # perl -v
This is perl, v5.10.0 built for cygwin-thread-multi-64int
[snip]
# perl -MBenchmark -lwe 'my$x=q//; print timestr(timeit(eval($ARGV[0])
+,sub{$x .= (q/x/ x 1000);}));' 10_000
1 wallclock secs ( 0.06 usr + 0.00 sys = 0.06 CPU) @ 158730.16/s (n
+=10000)
# perl -MBenchmark -lwe 'my$x=q//; print timestr(timeit(eval($ARGV[0]
+),sub{$x .= (q/x/ x 1000);}));' 100_000
1 wallclock secs ( 0.61 usr + 0.03 sys = 0.64 CPU) @ 156006.24/s (n
+=100000)
| [reply] [Watch: Dir/Any] [d/l] [select] |
|
This is perl, v5.10.1 (perl-5.10.1*) built for MSWin32-x86-multi-threa
+d
$ perl -MBenchmark -lwe "my$x=q//; print timestr(timeit(eval($ARGV[0])
+,sub{$x .= (q/x/ x 1000);}));" 10_000
32 wallclock secs (20.41 usr + 9.34 sys = 29.75 CPU) @ 336.13/s (n=10
+000)
ActivePerl >perl -v
This is perl, v5.8.9 built for MSWin32-x86-multi-thread (with 9 regist
+ered patches, see perl -V for more detail)
>perl -MBenchmark -lwe "my$x=q//; print timestr(timeit(eval($ARGV[0]),
+sub{$x .= (q/x/ x 100);}));" 10_000
33 wallclock secs (20.53 usr + 9.47 sys = 30.00 CPU) @ 333.32/s (n=10
+000)
| [reply] [Watch: Dir/Any] [d/l] [select] |
Re: Why is Windows 100 times slower than Linux when growing a large scalar?
by tallulah (Novice) on Dec 01, 2009 at 13:14 UTC
|
as a consequence of this report i am downloading now ubunto linux, i will install it on the second primary partition. this will be my first experience with linux
i hope the installation will not corrupt my first primary partition.
| [reply] [Watch: Dir/Any] |
|
Ubunto 9.10 really is quite usable. It's the first linux dist I've tried that I could say that about--and I've tried quite a few.
Now, if only Linus would relent and allow pluggable schedulers...
Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.
| [reply] [Watch: Dir/Any] |
|
| [reply] [Watch: Dir/Any] |
|
| [reply] [Watch: Dir/Any] |
|
| [reply] [Watch: Dir/Any] |
|
| [reply] [Watch: Dir/Any] |