Beefy Boxes and Bandwidth Generously Provided by pair Networks
Welcome to the Monastery
 
PerlMonks  

Inline::C on Windows: how to improve performance of compiled code?

by vr (Curate)
on Jun 14, 2018 at 23:22 UTC ( [id://1216682]=perlquestion: print w/replies, xml ) Need Help??

vr has asked for the wisdom of the Perl Monks concerning the following question:

Dual boot (i.e. same hardware), Ubuntu 17.10, Win10, threaded Perl 5.26 both. However, it seems some (default) flags (compiler options?) are set wrong for Inline::C on Windows. Here's output of no-op example for both systems:

$ perl -MTime::HiRes=time -MInline=C,'void foo(){}' -wE'$t=time;foo()f +or 1..1e8;say time-$t' 5.57836294174194 $ perl -MTime::HiRes=time -wE'$t=time;sub foo(){}foo()for 1..1e8;say t +ime-$t' 5.003093957901 >perl -MTime::HiRes=time -MInline=C,"void foo(){}" -wE"$t=time;foo()fo +r 1..1e8;say time-$t" 11.325471162796 >perl -MTime::HiRes=time -wE"$t=time;sub foo(){}foo()for 1..1e8;say ti +me-$t" 5.04568600654602

The use of Inline::C for particular case, where I noticed that, is exactly for huge number of calls of very simple sub (but not no-op, of course). I understand sub calls in Perl are expensive, I'll live with that, but I don't understand the result above.

To counter "it's because Linux is better", the script was moved from very old 32-bit Windows machine, and I noticed (though can't provide any solid numbers right now), that benchmarks for pure-Perl parts showed perhaps 6-fold speed improvement, while the main part, in C, no more than 2-fold. Therefore I have a suspicion about sub-optimal Inline::C defaults there, and i hope for hints of what could be changed.

  • Comment on Inline::C on Windows: how to improve performance of compiled code?
  • Download Code

Replies are listed 'Best First'.
Re: Inline::C on Windows: how to improve performance of compiled code?
by syphilis (Archbishop) on Jun 15, 2018 at 04:52 UTC
    i hope for hints of what could be changed

    The first hint that comes to mind is to perform the 1e8 function calls to foo() from inside C space, rather than from Perl space.
    On top of the significant reduction in overhead, one then might also get to take advantage of C optimizations that are lost when the C function is called from Perl.
    The following script aims at demonstrating the sort of savings you might get. I've changed foo() to be a little bit more than a no-op, in the hopes that it will remove the effect of clever C optimizations. (I don't know if I've been successful.):
    use Time::HiRes qw(time); use Inline C => <<'EOC'; int foo(int x) {return x + 1;} int foo_bar(int x) { int i; for(i = 0; i < x; i++){ foo(i); } return x + i; } EOC $iterations = 10000000; # 1e7 $t = time; foo($_) for 1 .. $iterations; print "# ", time - $t, "\n"; $t = time; foo_bar($iterations); print "# ", time - $t, "\n"; # Outputs (on Windows): # 1.57560181617737 # 0.0026400089263916
    On my Ubuntu (16.04) box, running perl-5.26, the same script outputs:
    # 1.80176305770874 # 0.0337138175964355
    Cheers,
    Rob
      # 18.2613549232483 # 3.09944152832031e-006 # 7.17122101783752 # 0.186490774154663

      Those are results of running your example (with 1e8 iterations) on Windows and Linux, respectively. Looks to me, "C from C" on Windows got optimized away, but "C from Perl" gives same picture (lagging behind, W vs L) as in OP. And right now I'm interested in eliminating this lagging. Optimization to try to avoid 1e8 calls is in future plans ;).

        Looks to me, "C from C" on Windows got optimized away

        I don't think so. (Could be wrong but.)
        A clearer ilustration is (hopefully) this script:
        use Time::HiRes qw(time); use Inline C => Config => #OPTIMIZE => '-O0', FORCE_BUILD => 1; use Inline C => <<'EOC'; void foo() {} void foo_bar(int x) { int i; for(i = 0; i < x; i++){ foo(); } } EOC $iterations = 10000000; $t = time; foo() for 1 .. $iterations; print "# ", time - $t, "\n"; $t = time; foo_bar($iterations); print "# ", time - $t, "\n";
        As it stands, with optimization enabled, it outputs (on Windows):
        # 1.02960205078125 # 1.00135803222656e-005
        Now that second value does look like something was optimized away. I'm thinking the loop is simply doing nothing at each iteration.
        When we switch optimization off by including the "OPTIMIZE => '-O0'" line, the output changes to (on Windows):
        # 1.10760188102722 # 0.0196361541748047
        The "C from C" code now takes 500 times longer to execute - because, I think, this time foo() is actually being called at each iteration. But it's still 50 times quicker than calling "C from Perl".

        I've no useful ideas regarding things that can be done to enable Windows to access C subs as quickly as it can access Perl subs - and that's the main reason that I'm avoiding that aspect.

        Cheers,
        Rob
Re: Inline::C on Windows: how to improve performance of compiled code?
by BrowserUk (Patriarch) on Jun 14, 2018 at 23:55 UTC

    What output do you get from perl -V:ccflags


    With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority". The enemy of (IT) success is complexity.
    In the absence of evidence, opinion is indistinguishable from prejudice. Suck that fhit
      >perl -V:ccflags ccflags=' -s -O2 -DWIN32 -DWIN64 -DCONSERVATIVE -D__USE_MINGW_ANSI_STD +IO -DPERL_TEXTMODE_SCRIPTS -DPERL_IMPLICIT_CONTEXT -DPERL_IMPLICIT_SY +S -DUSE_PERLIO -fwrapv -fno-strict-aliasing -mms-bitfields'; $ perl -V:ccflags ccflags='-D_REENTRANT -D_GNU_SOURCE -fwrapv -fno-strict-aliasing -pipe + -fstack-protector-strong -I/usr/local/include -D_LARGEFILE_SOURCE -D +_FILE_OFFSET_BITS=64';
Re: Inline::C on Windows: how to improve performance of compiled code?
by ikegami (Patriarch) on Jun 15, 2018 at 15:24 UTC

    Are both builds threaded?

      Are both builds threaded?

      Yes, I wondered about that, too.

      When I ran vr's original code on Ubuntu, with both threaded and unthreaded builds of perl-5.26, the timings didn't alter significantly.
      But I didn't have an unthreaded Windows build to test with.

      I have now built one (perl-5.26) only to discover that one can't install a usable Inline::C because Win32-IPC (more specifically, Win32-Mutex) doesn't compile.
      One can successfully force install Inline::C, but it's unusable.
      Unfortunately, Win32::IPC builds using Module::Build - and I simply cannot stomach any troubleshooting that involves Module::Build.

      It would have been nice to verify whether an unthreaded build on Windows does access Inline::C functions faster.
      Maybe someone else ....

      Cheers,
      Rob
        One can successfully force install Inline::C, but it's unusable

        While that's so for the latest version of Inline::C, it's quite simple to install older versions of Inline::C on the unthreaded Windows (as they don't carry the Win32::IPC baggage that comes with recent versions).
        So, I installed Inline-0.55, though I perhaps didn't need to go that far back.
        Here are the results using vr's original one-liners:

        On threaded perl-5.26.0 with current Inline::C version 0.78:
        >perl -MTime::HiRes=time -MInline=C,"void foo(){}" -wE"$t=time;foo()fo +r 1..1e8;say time-$t" 11.0136189460754 >perl -MTime::HiRes=time -wE"$t=time;sub foo(){}foo()for 1..1e8;say ti +me-$t" 5.39761018753052


        On threaded perl-5.26.0 with Inline::C version 0.55:
        >perl -MTime::HiRes=time -MInline=C,"void foo(){}" -wE"$t=time;foo()fo +r 1..1e8;say time-$t" 10.4052181243896 >perl -MTime::HiRes=time -wE"$t=time;sub foo(){}foo()for 1..1e8;say ti +me-$t" 5.58481001853943


        On unthreaded perl-5.26.0 with Inline::C version 0.55:
        >perl -MTime::HiRes=time -MInline=C,"void foo(){}" -wE"$t=time;foo()fo +r 1..1e8;say time-$t" 4.92960906028748 >perl -MTime::HiRes=time -wE"$t=time;sub foo(){}foo()for 1..1e8;say ti +me-$t" 7.65961289405823
        It therefore appears that reverting to an older version of Inline::C makes very little difference, whereas using Inline::C on an unthreaded Windows perl-5.26.0 markedly improves performance when calling Inline::C subs from perl.
        Unfortunately, it also seems that calling perl subs on an unthreaded Windows perl-5.26.0 takes about 30% longer (as compared to the time it takes on the threaded perl).

        Of course, things might be quite different on the soon-to-be-released perl-5.28.0.
        And things might also be quite different on 32-bit builds of perl.

        Cheers,
        Rob
Re: Inline::C on Windows: how to improve performance of compiled code?
by sundialsvc4 (Abbot) on Jun 15, 2018 at 13:04 UTC
    We really need more information about what this program is actually doing, and how the C code relates to the Perl code. Your no-op example are precisely this ... no-ops ... and could only possibly be exercising the thunking logic that passes control to the inline environment and back. Could it be that the two Perl interpreters are compiled differently? "C" optimization can't possibly be a factor when the "C" subroutines don't do anything: the difference must be in the two Perl interpreters themselves.
      the difference must be in the two Perl interpreters themselves

      Couldn't it alternatively be that the difference is in the way that the 2 different systems (ie Linux and Windows) access functions in external shared libraries ?

      Cheers,
      Rob
        Interesting question...

        I came across a public article about linking and differences between Windows and Unix by Symantec:
        Symantec, Dynamic Linking in Linux and Windows, part one
        Symantec, Dynamic Linking in Linux and Windows, part two

        This is not a "light read" and I did not study it in depth. I'm not sure if this article "jives" with what I heard 20 years ago..that Linux can load a .so faster than Windows can load a .DLL, because Windows has to adjust jump addresses and potentially make another copy of the DLL which takes time, but that once linked, calling (running) with the Windows DLL is faster? I was working on systems around the time that NT was a separate thing from Windows.

        Anyway there appears to be a lot of detail for those inclined in the above articles.

        I have never used Inline:C or linked to an external C program with Perl. I don't know what difference there may be with that either. BrowserUk does a lot of Windows combined with C stuff and hopefully he can shower some wisdom down upon us.

        A reply falls below the community's threshold of quality. You may see it by logging in.
Re: Inline::C on Windows: how to improve performance of compiled code?
by sundialsvc4 (Abbot) on Jun 17, 2018 at 23:18 UTC

    So, have we by now established that a one-liner which does absolutely nothing in the loop, as I suggested, does not run in a time comparable to the one that uses the no-op inline function, but runs considerably faster?   That my hypothesis that this does not actually have to with the inline implementation but is endemic to the Perl interpreter version itself, has therefore now been tested and has been proven false?   I wasn’t clear if this idea had been tried – or simply ignored – and what the outcome turned out to be, if it was tried.

      So, have we by now established that a one-liner which does absolutely nothing in the loop, as I suggested, does not run in a time comparable to the one that uses the no-op inline function, but runs considerably faster?

      I don't think we have, as it was never really the issue.
      The issue was more about some apparent additional overhead in calling XS subs on Windows, and what to do about it.

      I do, however, think it is very likely that code that does nothing 10 million times will execute significantly faster than code that calls a no-op 10 million times - and that your suggestion is most likely quite correct.

      Cheers,
      Rob
      A reply falls below the community's threshold of quality. You may see it by logging in.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://1216682]
Approved by taint
Front-paged by haukex
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others romping around the Monastery: (7)
As of 2024-04-19 14:03 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found