Beefy Boxes and Bandwidth Generously Provided by pair Networks
"be consistent"
 
PerlMonks  

Re: Our perl/xs/c app is 30% slower with 64bit 5.24.0, than with 32bit 5.8.9. Why?

by BrowserUk (Patriarch)
on Dec 21, 2016 at 18:10 UTC ( [id://1178315]=note: print w/replies, xml ) Need Help??


in reply to Our perl/xs/c app is 30% slower with 64bit 5.24.0, than with 32bit 5.8.9. Why?

It'd be a whole lot easier to diagnose if we knew what the application was doing. And easier still if we could see the code.

The first thing you need to do is profile both and work out where the time is being used. Once you know that, it'll be easier to reason about the cause.


With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority". The enemy of (IT) success is complexity.
In the absence of evidence, opinion is indistinguishable from prejudice.
  • Comment on Re: Our perl/xs/c app is 30% slower with 64bit 5.24.0, than with 32bit 5.8.9. Why?

Replies are listed 'Best First'.
Re^2: Our perl/xs/c app is 30% slower with 64bit 5.24.0, than with 32bit 5.8.9. Why?
by Anonymous Monk on Dec 21, 2016 at 20:47 UTC
    It'd be a whole lot easier to diagnose if we knew what the application was doing. And easier still if we could see the code.

    The first thing you need to do is profile both and work out where the time is being used. Once you know that, it'll be easier to reason about the cause.

    Thanks for the reply. As the the other Anonymous Monk said, the app is a mix of perl/xs/c and is difficult to profile on Windows (normally I would use valgrind on linux). I did try Very Sleepy, but nothing stood out. Can you recommend a profiler for windows?

    The app itself mainly deals with numeric data. Lots of double datatypes in C structs, with data manipulation in C, with perl providing an 'API' to make things easy for the end user, so something like this:

    my $sum = $apple + $orange;
    The $apple and $orange variables are objects (typically mapped to large double vectors), the vector calculation would be carried out in C etc.

    While our performance benchmarks are representative of our real workloads, they are very broad in nature...and contain lots and lots of code... While all the tests perform worse, the ones that stand out most (ie, 80%+ worse) do create more perl/xs objects than typical, so perhaps that is where I should start looking?

      I did try Very Sleepy, but nothing stood out. Can you recommend a profiler for windows?

      Hm. That's the one I use for profiling C code; and I've found it very effective. Effective to the point of detecting a difference between two identical opcodes where one causes a cache miss and the other doesn't.

      I'd love to take a look at the output from identical runs with the two builds.

      the ones that stand out most (ie, 80%+ worse) do create more perl/xs objects than typical, so perhaps that is where I should start looking?

      I'd start by rebuilding the 5.24 without PERL_COPY_ON_WRITE & PERL_HASH_FUNC_ONE_AT_A_TIME_HARD individually and together and see what effect they have.

      I believe (perhaps wrongly) that the first is a space for speed tradeoff which might be factor.

      The second is an (IMO) unnecessary fix for a non-problem that substitutes a different, more time consuming hashing function for the one used in 5.8.9 for "security reasons". Try replacing PERL_HASH_FUNC_ONE_AT_A_TIME_HARD with PERL_HASH_FUNC_ONE_AT_A_TIME_OLD and see if that makes any difference.

      Beyond those guesses, I'd need to see the profiler output.


      With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
      Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
      "Science is about questioning the status quo. Questioning authority". The enemy of (IT) success is complexity.
      In the absence of evidence, opinion is indistinguishable from prejudice.
        I believe (perhaps wrongly) that the first is a space for speed tradeoff
        Except for edge cases and possible bugs, COW is intended on average to use less memory and less CPU.

        The second is an (IMO) unnecessary fix for a non-problem
        A non-problem that allows you to trivially DoS any web server where input from the client (such as headers or parameters) are fed into a perl hash.

        Anyway, perl's hash handling has been getting faster, not slower in recent years. This trivial code (read 0.5M words from a dictionary file and store in a hash):

        open my $fh, "</usr/share/dict/words" or die; my %h; $h{$_}++ while <$fh>;
        consumes the following number of CPU Mcycles under various perls:
        5.8.9 1,245 5.18.0 1,143 5.20.0 1,113 5.22.0 1,163 5.24.0 1,089

        Dave.

        Beyond those guesses, I'd need to see the profiler output.

        Ok, I've found the issue. As with these things, a very unexpected source..

        pthread_mutex_lock

        We use pthreads as our threading library and for some reason, the version of pthreads that comes with strawberry is massively slower than what we are currently using. Remove all the lock/unlocks, and the 64bit 5.24.0 is faster than 32bit 5.8.9.

        Now to figure out why this version of the library is so slow...

        Hm. That's the one I use for profiling C code; and I've found it very effective. Effective to the point of detecting a difference between two identical opcodes where one causes a cache miss and the other doesn't.

        Ok, you've inspired me to look at sleepy again. Do you have any tips on using sleepy? Due to it sampling, I assume that the test cases need to run for some time? Any specific compile options I should use?

        I isolated some of the code for the memory test case (the 80%+ slow down), and it turns our that the 64bit 5.24 version is much faster than the 32bit 5.8.9 version on basic perl/xs/c object creation/destruction. I need to do more digging.

        I've been writing other test cases, and I'm suspecting something in the xs layer.

        Why?

      Are you able to try Devel::NYTProf? I've found this module to be a very powerful profiling tool.

      Alex / talexb / Toronto

      Thanks PJ. We owe you so much. Groklaw -- RIP -- 2003 to 2013.

Re^2: Our perl/xs/c app is 30% slower with 64bit 5.24.0, than with 32bit 5.8.9. Why?
by Anonymous Monk on Dec 21, 2016 at 18:17 UTC
    It is a mix of perl/xs/c on windows.

      For example, if your app is doing lots of sorting of strings, the presence of these defines in the 5.24 build parameters:

      USE_LOCALE USE_LOCALE_COLLATE USE_LOCALE_CTYPE USE_LOCALE_NUMERIC USE_ +LOCALE_TIME

      Might be the source of the slowdown; but if your app is purely mathematical, probably not.


      With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
      Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
      "Science is about questioning the status quo. Questioning authority". The enemy of (IT) success is complexity.
      In the absence of evidence, opinion is indistinguishable from prejudice.

      That tells us what it is, but not what it is doing. But if you don't need help ...


      With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
      Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
      "Science is about questioning the status quo. Questioning authority". The enemy of (IT) success is complexity.
      In the absence of evidence, opinion is indistinguishable from prejudice.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://1178315]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others making s'mores by the fire in the courtyard of the Monastery: (5)
As of 2025-07-18 09:48 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found

    Notices?
    erzuuliAnonymous Monks are no longer allowed to use Super Search, due to an excessive use of this resource by robots.