|laziness, impatience, and hubris|
How to make your Perl 30% fasterby PetaMem (Priest)
|on Nov 16, 2004 at 12:18 UTC||Need Help??|
I have been using the fastest "Perl interpreter ever" (at least from my experience) for quite some time now. It seems stable, so I'd like to share that knowledge with you.
Nicholas gave an excellent talk about the topic When Perl is not fast enough some time ago. He mentions that "compiling your own perl" may be an option and reports speed gains in between 5% and 14%.
With recent improvements of GCC and its autovectorization feature, I thought that I could spend a sunday trying to find out what it would bring me.
I fetched sources for both gcc 3.4 and perl 5.8.5. Then I compiled GCC 3.4 and then compiled with that GCC 3.4 Perl 5.8.5 with the options -msse2 and -O3. Use -msse if your CPU doesn't support sse2. The "autovectorized" perl is constantly 30% faster than the plain-vanilla perl that comes with a standard linux distribution (I suppose compiled for pentium), with the lowest speedup seen at 20% for store/retrieval and highest speedup about 40% for some list manipulations.
I can tell you, that 30% is significant and makes recompiling worthwhile. Moreover, it seems GCC doesn't autovectorize all cases it could, so we can probably expect some more improvements. I also suppose, that real GCC cracks could find more optimizations for the P4 architecture, but neither my time, nor my knowledge allowed me more experiments.
More specifications about environment and compared interpreters:
As you may or may not see from the data below, environment is (SuSE) Linux 8.2, CPU is a Pentium 4-M 1,8GHz. Benchmarked was our application for natural language processing/understanding. Where some heavy operations on N-ary trees (List-based implementation of ours - not that on CPAN) happen. E.g. the "normalization" of a swedish lexicon (removing redundant data, sorting trees etc.) takes 423 seconds with the standard perl, and 288 seconds with the optimized one. This is a pretty hard benchmark as it extremely shuffles data around. We have also results where information about a lexicon is gathered where the speedup is a factor of ten(10!). I.e. about 150.000 lexicon entries are iterated and the number of meanings per entry is evaluated and added to a total. Takes for the swedish lexicon 20 seconds on the unoptimized version and 2(!!) seconds on the optimized version.
This is the "fast baby":
(none):/tmp # /usr/local/bin/perl -V Summary of my perl5 (revision 5 version 8 subversion 5) configuration: Platform: osname=linux, osvers=2.4.26-mh2, archname=i686-linux-thread-multi uname='linux sol 2.4.26-mh2 #7 mon aug 23 11:30:25 cest 2004 i686 unknown unknown gnulinux ' config_args='' hint=previous, useposix=true, d_sigaction=define usethreads=undef use5005threads=undef useithreads=undef usemultiplicity=define useperlio=define d_sfio=undef uselargefiles=define usesocks=undef use64bitint=undef use64bitall=undef uselongdouble=undef usemymalloc=n, bincompat5005=undef Compiler: cc='/usr/local/bin/gcc', ccflags ='-fno-strict-aliasing -pipe -I/usr/local/include -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64', optimize='-O3 -msse2', cppflags='-fno-strict-aliasing -pipe -I/usr/local/include -fno-strict-aliasing -pipe -I/usr/local/include -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64 -fno-strict-aliasing -pipe -I/usr/local/include -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64' ccversion='', gccversion='3.4.2', gccosandvers='' intsize=4, longsize=4, ptrsize=4, doublesize=8, byteorder=1234 d_longlong=define, longlongsize=8, d_longdbl=define, longdblsize=12 ivtype='long', ivsize=4, nvtype='double', nvsize=8, Off_t='off_t', lseeksize=8 alignbytes=4, prototype=define Linker and Libraries: ld='/usr/local/bin/gcc', ldflags =' -L/usr/local/lib' libpth=/usr/local/lib /lib /usr/lib libs=-lnsl -ldb -ldl -lm -lcrypt -lutil -lc perllibs=-lnsl -ldl -lm -lcrypt -lutil -lc libc=, so=so, useshrplib=false, libperl=libperl.a gnulibc_version='2.3.2' Dynamic Linking: dlsrc=dl_dlopen.xs, dlext=so, d_dlsymun=undef, ccdlflags='-Wl,-E' cccdlflags='-fpic', lddlflags='-shared -L/usr/local/lib' Characteristics of this binary (from libperl): Compile-time options: MULTIPLICITY USE_LARGE_FILES PERL_IMPLICIT_CONTEXT Built under linux Compiled at Oct 26 2004 23:48:45 @INC: /usr/local/lib/perl5/5.8.5/i686-linux /usr/local/lib/perl5/5.8.5 /usr/local/lib/perl5/site_perl/5.8.5/i686-linux /usr/local/lib/perl5/site_perl/5.8.5 /usr/local/lib/perl5/site_perlThis is the perl it was compared against
Summary of my perl5 (revision 5.0 version 8 subversion 0) configuration: Platform: osname=linux, osvers=2.4.20-4gb-athlon, archname=i686-linux-thread-multi uname='linux builder 2.4.20-4gb-athlon #1 mon mar 17 17:56:47 utc 2003 i686 unknown unknown gnulinux ' config_args='-ds -e -Dprefix=/opt/perl-5.8.0_t -Dman1dir=/opt/perl-5.8.0_t/man/man1 -Dman3dir=/opt/perl-5.8.0_t/man/man3 -Uinstallusrbinperl -Dusethreads -Di_db -Duseshrplib=true' hint=recommended, useposix=true, d_sigaction=define usethreads=define use5005threads=undef useithreads=define usemultiplicity=define useperlio=define d_sfio=undef uselargefiles=define usesocks=undef use64bitint=undef use64bitall=undef uselongdouble=undef usemymalloc=n, bincompat5005=undef Compiler: cc='cc', ccflags ='-D_REENTRANT -D_GNU_SOURCE -fno-strict-aliasing -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64', optimize='-O3', cppflags='-D_REENTRANT -D_GNU_SOURCE -fno-strict-aliasing' ccversion='', gccversion='3.3 20030226 (prerelease) (SuSE Linux)', gccosandvers='' intsize=4, longsize=4, ptrsize=4, doublesize=8, byteorder=1234 d_longlong=define, longlongsize=8, d_longdbl=define, longdblsize=12 ivtype='long', ivsize=4, nvtype='double', nvsize=8, Off_t='off_t', lseeksize=8 alignbytes=4, prototype=define Linker and Libraries: ld='cc', ldflags =' -L/usr/local/lib' libpth=/usr/local/lib /lib /usr/lib libs=-lnsl -ldl -lm -lpthread -lc -lcrypt -lutil perllibs=-lnsl -ldl -lm -lpthread -lc -lcrypt -lutil libc=, so=so, useshrplib=true, libperl=libperl.so gnulibc_version='2.3.2' Dynamic Linking: dlsrc=dl_dlopen.xs, dlext=so, d_dlsymun=undef, ccdlflags='-rdynamic -Wl,-rpath,/opt/perl-5.8.0_t/lib/5.8.0/i686-linux-thread-multi/CORE' cccdlflags='-fpic', lddlflags='-shared -L/usr/local/lib' Characteristics of this binary (from libperl): Compile-time options: MULTIPLICITY USE_ITHREADS USE_LARGE_FILES PERL_IMPLICIT_CONTEXT Built under linux Compiled at Jan 12 2004 10:13:27 @INC: /opt/perl-5.8.0_t/lib/5.8.0/i686-linux-thread-multi /opt/perl-5.8.0_t/lib/5.8.0 /opt/perl-5.8.0_t/lib/site_perl/5.8.0/i686-linux-thread-multi /opt/perl-5.8.0_t/lib/site_perl/5.8.0 /opt/perl-5.8.0_t/lib/site_perl
As you can see, even the old perl was compiled with -O3 so one cannot say it was not optimized in any way.
I'd like to reiterate, that I also saw this as an experiment that probably would fail, because I also was reluctant sacrificing stability for speed. But I'm using the optimized Perl now on a regular basis and it has proven to work with only one side effect. It's faster. :-)