Beefy Boxes and Bandwidth Generously Provided by pair Networks
XP is just a number
 
PerlMonks  

Compile perl for performance

by learnedbyerror (Monk)
on Aug 14, 2018 at 14:55 UTC ( [id://1220325]=perlquestion: print w/replies, xml ) Need Help??

learnedbyerror has asked for the wisdom of the Perl Monks concerning the following question:

Oh so kind Monks

Last night, I watched Graham TerMarsch's presentation "Red Wunz Go Fasta" from TPC 2018. It inspired me to go out and build perl 5.28.0 using perlbrew adding -D usemyalloc and -D optimize="-O3" flags. I have run a few benchmarks for one of my applications that munges through a large corpus of data files and builds several LMDB databases containing the parsed/analyzed information. The result is that I have shaved almost 25% of my runtime for this one program from per 5.26.2. This is running in a Debian Jessie LXC on a proxmox physical host running Debian Stretch

In doing the above, I violated one of the cardinal rules of "Optimization Club" - make one change at a time. I changed both perl versions and two compiler flags. I promise to be more disciplined next time.

My question to you oh so wise ones is - what options should I consider when trying to optimize against my specific code based?

Some of those that come to mind are:

  • Compiler and compiler version - gcc/clang
  • glibc version
  • compiler flags
  • perl features - such as no threads

At this time, my platforms of interest are Linux and Macos. My current plan is to compare 5.26.2 to 5.28.0 only. I have done previous comparisons with earlier versions of perl and have seen significant performance increases from versions prior to 5.26.0. This is primarily due to the changes in hash generation as my applications tends to be hash heavy.

I kindly request that those of you who may be inclined to inform me using intemperate language that this investigation is folly save your time and not try to convince a fool of his self-recognized folly

I do greatly appreciate those who do offer up options. I will happily update this thread with my findings.

Thank you in advance

lbe

Replies are listed 'Best First'.
Re: Compile perl for performance
by Corion (Patriarch) on Aug 15, 2018 at 08:36 UTC

    I would go though the points listed in Nicholas Clark's "When Perl is Not Quite Fast Enough", which mostly suggests compiling a Perl without threads and some other compiler options for a "free" speed boost, provided you don't need the features. It also suggests some nastier stuff like opcode golf, but I assume that your main goal is to improve the program speed without changing the program itself.

    I would also look at toggling the COW-feature and potentially switching the compiler flags to optimize for the target CPU architecture specifically.

    Unfortunately, neither I nor Google seem to find the slides online. Maybe sending Nicholas an email prompts him to put the slides online somewhere.

      Couldn't find the deck either. I did drop a note to Nicholas and let him know about this post and asked for a link.

      And yes, in this exercise, I am working with code that already have been optimized and am still searching for more speed.

      I believe the CoW feature has been enabled by defaults since 5.20. I have not yet gone native on the arch but will shortly

      Thanks for your response!

      lbe

Re: Compile perl for performance
by Marshall (Canon) on Aug 14, 2018 at 21:54 UTC
    what options should I consider when trying to optimize against my specific code based?

    My inclination would be none of the things that you listed. Your benchmarks sound about right for an optimizing compiler. However, often in a large application a 10x+ performance increase (or even much more) can be had by adjusting either the algorithm or the implementation coding of your existing algorithm.

    Another factor to consider is that the higher you crank the compiler optimization level, the more chance there is of the compiler making a mistake. I've lost contact with a friend at the moment, but he was analyzing optimize levels in all the major C compilers as part of a PhD level paper. I learned that all of these things make mistakes if the code is complex enough - I'm sure that Perl itself is complex enough for that qualification. Since you are optimizing the compiled C code, some kind of flaw in what Perl does could be hard to track back - could be that some nasty thing happens that is very hard to figure out.

    I personally would look for algorithmic and coding enhancements far before messing with compiling Perl itself at a higher optimization level.

      Marshall,

      Thanks for your post. I agree with everything that you have written. I neglected to say that the code that I am testing has been around for a while. My first version of it took 32 hours to run. Through refining algorithms I was able to reduce it down to about 12 hours. I then ran it through Devel::NYTProf and based on a half a dozen or so iterations and algorithmic changes reduced the run time to about 8 hours.

      At this point, I'm being greedy and seeing what else I can get. The vast majority of the time is spent in the LMDB driver writing to the database in this case. This accounts for about 80% of the run time. The next chunk is about 10% for Sereal to serialize the HashRef which is written to the db. The next chunk after that is about 5% to parse and analyze the input data into the HashRef in the previous chunk. The last 5% covers the reading of the input files and other miscellaneous.

      My belief, based upon observing perl magic at a distance, is that between 5.28, usemyalloc and O3 that there is a net improvement on I/O, XS Integration and complier optimization that gets me to down to 6 hours.

      If I just applied the last perl version and compiler optimization, I would only be down to 24 hours from 32. That vast majority of getting from 32 to 6 hours, all but 2 hours of the reduction, is due to algorithmic improvement.

      I am somewhat concerned about the possibility of instability that you mentioned. In my experience, I have found O2 a reliable optimization level for gcc in general. I have run into problems with O3 where it helped on some code and actually made it worse on other code. One of the things that I love about App::perlbrew is that I can easily have multiple versions of perl installed. The version that I use every day is compiled with no additional flags. I do usually have one version available compiled with O2 for those programs where through testing I know that I receive a needed boost.

      Thanks again for your advice!

      lbe

        Hi Ibe (learnedbyerror)!

        Thanks for your informative response!
        I think that we are "on the same page" and you know what you are doing!

        You are quite correct to be suspicious of gcc -O3 level. I TA an Advanced Assembly class whenever it "makes", which is only about every 5 years. It is a difficult class and it takes years to get enough qualified students in order to justify running the class.

        Sometimes we play "beat the compiler". This is possible at even at the highest optimization level. I agree that -O2 is "fairly safe". At -O3 the compiler gets increasingly bizarre in what it does - it writes ASM that no human would ever think of. It may even write code that winds up slower! The Phd guy I alluded to in an earlier post was one of our students.

        Here is one suggestion which may or may not help you:

        Your application is very DB intensive.
        The DB will have two important general limits:

        1. the number of operations per second
        2. a much smaller number, the number of transactions per second
        As it turns out, the commit of 1 million inserts doesn't take much longer than the commit of just one single insert.

        See if you can reduce the number of DB transactions per second. This can have a huge impact upon performance! You may or may not be done with the first part of optimization (algorithm and coding enhancements).

        I suspect that there is still more than a 25% improvement that can be had without resorting to an optimized compile of Perl itself.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://1220325]
Approved by hippo
Front-paged by Corion
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others drinking their drinks and smoking their pipes about the Monastery: (5)
As of 2024-04-24 20:45 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found