in reply to Re^2: Our perl/xs/c app is 30% slower with 64bit 5.24.0, than with 32bit 5.8.9. Why? in thread Our perl/xs/c app is 30% slower with 64bit 5.24.0, than with 32bit 5.8.9. Why?
I did try Very Sleepy, but nothing stood out. Can you recommend a profiler for windows?
Hm. That's the one I use for profiling C code; and I've found it very effective. Effective to the point of detecting a difference between two identical opcodes where one causes a cache miss and the other doesn't.
I'd love to take a look at the output from identical runs with the two builds.
the ones that stand out most (ie, 80%+ worse) do create more perl/xs objects than typical, so perhaps that is where I should start looking?
I'd start by rebuilding the 5.24 without PERL_COPY_ON_WRITE & PERL_HASH_FUNC_ONE_AT_A_TIME_HARD individually and together and see what effect they have.
I believe (perhaps wrongly) that the first is a space for speed tradeoff which might be factor.
The second is an (IMO) unnecessary fix for a non-problem that substitutes a different, more time consuming hashing function for the one used in 5.8.9 for "security reasons". Try replacing PERL_HASH_FUNC_ONE_AT_A_TIME_HARD with PERL_HASH_FUNC_ONE_AT_A_TIME_OLD and see if that makes any difference.
Beyond those guesses, I'd need to see the profiler output.
With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
In the absence of evidence, opinion is indistinguishable from prejudice.
Re^4: Our perl/xs/c app is 30% slower with 64bit 5.24.0, than with 32bit 5.8.9. Why?
by dave_the_m (Monsignor) on Dec 22, 2016 at 09:56 UTC
|
I believe (perhaps wrongly) that the first is a space for speed tradeoff
Except for edge cases and possible bugs, COW is intended on average to use less memory and less CPU.
The second is an (IMO) unnecessary fix for a non-problem
A non-problem that allows you to trivially DoS any web server where input from the client (such as headers or parameters) are fed into a perl hash.
Anyway, perl's hash handling has been getting faster, not slower in recent years.
This trivial code (read 0.5M words from a dictionary file and store in a hash):
open my $fh, "</usr/share/dict/words" or die;
my %h;
$h{$_}++ while <$fh>;
consumes the following number of CPU Mcycles under various perls:
5.8.9 1,245
5.18.0 1,143
5.20.0 1,113
5.22.0 1,163
5.24.0 1,089
Dave. | [reply] [d/l] [select] |
|
First: I did say "Beyond those guesses,". The information provided by the OP so far consists solely of the build parameters for the two builds. I compared those two sets and attempted to reason about possibilities.
A non-problem that allows you to trivially DoS any web server where input from the client
Hm. That problem was addressed way back in 2003/5.8.1 with something akin to this:
So what new problem was addressed by the 5.17 changes? (And has anyone ever seen a plausible demonstration of that "new problem"? Has there ever been a reported sighting of anyone exploiting that new problem in the field? If the change is so critical, why wasn't it back-ported to 5.10 and other earlier versions that are still being shipped with 95% of *nix distributions?)
Anyway, perl's hash handling has been getting faster, not slower in recent years.
Agreed. Not just hash handling, but just about every aspect of Perl (save maybe string handling) has gotten faster in recent builds. Congratulations.
However, over the years there have been some weird behaviours that only affected windows builds.
Once again I'll remind you that I was attempting to help the OP on the basis of the minimal information supplied; whilst asking him to provide more.
With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
In the absence of evidence, opinion is indistinguishable from prejudice.
| [reply] |
|
So what new problem was addressed by the 5.17 changes?
I can't remember the full details off the top of my head,
but amongst others issues, there was a bug in the 5.8.1 implementation that, with
a suitably crafted set of keys, could trigger the hash code into doubling the bucket size for every added key, making it trivial to exhaust a web server's memory. It was also shown that the ordering of keys extracted from a hash (like a web server returning unsorted headers) could be used to determine the server's hash seed.
And has anyone ever seen a plausible demonstration of that "new problem"?
On the security list I've seen simple code (that puts a particular sequence of keys into hash) that can crash the perl process.
Has there ever been an reported sighting of anyone exploiting that new problem in the field?
That shouldn't be the criteria for fixing security issues.
If the change is so critical, why wasn't it back-ported to 5.10 and other earlier versions that are still being shipped with 95% of *nix distributions?)
We backported the relevant changes to all maintained perl versions. It's up to vendors whether they patch old unsupported perl versions if they still ship them.
Dave.
| [reply] |
|
|
|
|
Are you sure these speedups are due to Perl's hash improvements, and not improvements in Perl's IO handling? Because that latter would have been my first guess. A more interesting comparison might be to time the script under two modes, one with a simple counter increment and one with the hash addition. The difference between these two running times would be more illuminating, I think.
| [reply] |
|
There's a tool in the perl src repository which uses cachegrind behind the scenes to accurately measure how many CPU instructions, data reads etc a small snippet of code uses. With the following initial setup (so the hash already exists and has some keys):
my %h = qw(a 1 b 2 c 3 d 4); my $key = "foo";
Running the following benchmark (using a non-constant key so the key's hash gets recalculated each time):
$h{$key} = 1; delete $h{$key}
Shows the following results on various perls:
Key:
Ir Instruction read
Dr Data read
Dw Data write
COND conditional branches
IND indirect branches
_m branch predict miss
_m1 level 1 cache miss
_mm last cache (e.g. L3) miss
- indeterminate percentage (e.g. 1/0)
The numbers represent raw counts per loop iteration.
perl589o perl5101o perl5125o perl5144o perl5163o perl5184o perl
+5203o perl5222o perl5240o perl5258o
-------- --------- --------- --------- --------- --------- ----
+----- --------- --------- ---------
Ir 1348.0 1340.4 1378.0 1383.0 1423.0 1453.0 1
+466.0 1368.0 1356.0 1300.0
Dr 414.0 403.0 411.0 404.0 408.0 403.0
+411.0 379.0 373.0 362.0
Dw 226.0 214.0 222.0 227.0 228.0 231.0
+231.0 208.0 206.0 196.0
COND 202.0 210.1 210.0 204.0 213.0 204.0
+210.0 199.0 197.0 188.0
IND 16.0 16.0 17.0 18.0 18.0 18.0
+ 17.0 14.0 12.0 14.0
COND_m 2.0 1.0 4.0 2.0 3.0 3.0
+ 1.0 2.0 2.0 3.0
IND_m 9.0 9.0 11.0 9.0 9.0 11.0
+ 9.0 5.0 5.0 5.0
Ir_m1 0.0 0.0 0.0 0.0 0.0 0.0
+ 0.0 -0.1 0.0 0.0
Dr_m1 0.0 0.0 0.0 0.0 0.0 0.0
+ 0.0 0.0 0.0 0.0
Dw_m1 0.0 0.0 0.0 0.0 0.0 0.0
+ 0.0 0.0 0.0 0.0
Ir_mm 0.0 0.0 0.0 0.0 0.0 0.0
+ 0.0 0.0 0.0 0.0
Dr_mm 0.0 0.0 0.0 0.0 0.0 0.0
+ 0.0 0.0 0.0 0.0
Dw_mm 0.0 0.0 0.0 0.0 0.0 0.0
+ 0.0 0.0 0.0 0.0
Which shows everything being much the same before 5.22 (and in particular no significant slowdown in 5.16), and things getting better since.
Dave. | [reply] [d/l] [select] |
Re^4: Our perl/xs/c app is 30% slower with 64bit 5.24.0, than with 32bit 5.8.9. Why?
by Anonymous Monk on Dec 23, 2016 at 16:05 UTC
|
Beyond those guesses, I'd need to see the profiler output.
Ok, I've found the issue. As with these things, a very unexpected source..
pthread_mutex_lock
We use pthreads as our threading library and for some reason, the version of pthreads that comes with strawberry is massively slower than what we are currently using. Remove all the lock/unlocks, and the 64bit 5.24.0 is faster than 32bit 5.8.9.
Now to figure out why this version of the library is so slow...
| [reply] [d/l] |
|
| [reply] |
Re^4: Our perl/xs/c app is 30% slower with 64bit 5.24.0, than with 32bit 5.8.9. Why?
by Anonymous Monk on Dec 22, 2016 at 10:11 UTC
|
Hm. That's the one I use for profiling C code; and I've found it very effective. Effective to the point of detecting a difference between two identical opcodes where one causes a cache miss and the other doesn't.
Ok, you've inspired me to look at sleepy again. Do you have any tips on using sleepy? Due to it sampling, I assume that the test cases need to run for some time? Any specific compile options I should use?
I isolated some of the code for the memory test case (the 80%+ slow down), and it turns our that the 64bit 5.24 version is much faster than the 32bit 5.8.9 version on basic perl/xs/c object creation/destruction. I need to do more digging.
I've been writing other test cases, and I'm suspecting something in the xs layer.
| [reply] |
|
Do you have any tips on using sleepy?
Once you've narrowed the scope of the possibilities, bracket the suspect code with calls to getc() or similar so that you can attach teh profiler just before the suspect code and stop instrumenting immediately after.
Not so useful if you've no idea where to look; but very useful once you do.
I assume that the test cases need to run for some time?
It kind of depends on the nature of the code; but longer you run in the errant code the more likely things are to stand out.
Any specific compile options I should use?
I use MS compilers rather than gcc, so I'm not familiar with the latters options. However, you should basically stick to the same options you use for your production code. Anything else is just apples and oranges.
With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
In the absence of evidence, opinion is indistinguishable from prejudice.
| [reply] |
|
"Anything else is just apples and oranges." And bananas.
| [reply] |
Re^4: Our perl/xs/c app is 30% slower with 64bit 5.24.0, than with 32bit 5.8.9. Why?
by Anonymous Monk on Dec 21, 2016 at 22:11 UTC
|
| [reply] |
|
| [reply] |
|
| [reply] |
|
|