Beefy Boxes and Bandwidth Generously Provided by pair Networks Bob
Just another Perl shrine
 
PerlMonks  

Re: Minimizing PAR performance hit

by tsee (Curate)
on Jan 08, 2007 at 09:35 UTC ( #593494=note: print w/ replies, xml ) Need Help??


in reply to Minimizing PAR performance hit

If I recall correctly --dependent is implied by -P. The reason is somewhat obvious (to me): If you need a Perl interpreter to execute the produced script (for those following along: -P produces a Perl script with all dependencies embedded), it's just pointless to ship a libperl along.

One thing I'd like to point out which I am sure you already know, but some other readers might not, is that the PAR "runtime" performance hit is related to the start-up time only. That is, it takes longer to start, but the performance once its running is identical. False assumptions about this have cropped up in the past.

If you produce a stand-alone .exe using PAR, that .exe will have to do some fancy bootstrapping procedure and ends up creating two processes instead of one. Reason is that it cannot rely on a single (perl) process to do the cleanup up of the temporary data if that was so requested by the packager. After all, the "inner" process is started from the temporary directory itself. I would say this constitutes a good part of the memory overhead.

Validating this, I tried loading the PAR module from a one-liner. This shows an increase of RAM use of about 1.5MB over a one-liner that consists of only "use strict; use warnings;" or about 2MB over one without any "use" statements. This difference between a script that uses PAR and one which doesn't might be further reduced if it uses more of the modules PAR loads internally anyway. Numbers at the end of this post.

The way to go to reduce the memory overhead is certainly to move away from stand-alone executables. Install a perl. Install PAR on the client-side. This is a pure-Perl module since version 0.970 and should install cleanly just about anywhere. Then ship a .par archive to the client. Any script that was packaged indo that .par can be executed with a one-liner:

perl -e 'use PAR {file => "foo.par", run => "myscript.pl"};'

This should incrementally unpack the necessary files from the .par archive instead of dumping the whole archive's contents to disk. (Don't quote me on this, though.)

Creating a .par archive is reasonably simple: Use the -p (without the -B) option to pp. Since pp (or rather Module::ScanDeps) is a little over-aggressive at including possible dependencies, you might want to unzip the .par into a temporary directory, remove anything that you are sure won't be needed, and then re-zip it. PAR::Dist can help with that a little if this feels uncomfortable.

Of course, unpacking stuff incrementally will be slower over all than dumping it all to disk at start-up. I don't know how much difference this makes, but at least you don't pay the penalty from the separate process, etc.

Finally, I would suggest looking at the PAR FAQ at par.perl.org. It has some good tips for reducing the executable size which might provide some benefit in this case, too.

Hope this helps,
Steffen

P.S.: The memory numbers as top showed:

3692 # -MPAR (0.970) 3392 # -MPAR (0.92) 2128 # -e "use strict; use warning;" 1656 # -e1
There was quite a bit of code added to PAR between 0.92 and 0.970, but 300kb of extra memory kind of surprises me.


Comment on Re: Minimizing PAR performance hit
Select or Download Code
Re^2: Minimizing PAR performance hit
by gaal (Parson) on Jan 08, 2007 at 11:16 UTC
    Thanks for the input.

    One thing I have since noticed, sorting included files by size, is that Math::Pari comes with a huge 6MB shared object! This is bad enough, though not of course PAR's fault; but it is made worse by the fact that I have several instances of my program running and apparently an .so embedded this way cannot be shared. So the effective difference between a PARed and unPARed version of my program is dramatic indeed.

    In my case, I might be able to get Math::Pari deployed separately. I think adding "look for large shared objects" can go in the FAQ, which is indeed otherwise useful.

      Wow, 6MB is quite a shared object!

      By the way, what do you mean by "cannot be shared"? Can't be shared in the concept of shared memory? Does that work if the instances run in separate perl's? If so, I wonder why this doesn't happen if you run the same par'd binary twice because the cache area should be the same. Does it, perhaps, help to put Math::Pari into a separate .par and using that? Try fiddling with the $ENV{PAR_GLOBAL_TEMP} variable to force sharing of the cache directory. And please forgive me for hand-waving. :)

      On a related note: If you download the PAR-0.970.tar.gz distribution and look for the pare utility in the contrib/ subdirectory, you'll find a tool to remove all common modules from one executable and make it depend on the other. I'm not sure it works for the dependency on a .par. But perhaps pp -X foo.par works by skipping the stuff in foo.par during packaging!

      If you plan to deploy Math::Pari to the target system(s), you can use a .par as well - if you like. Here, I would advise a slightly different process:

      • use PAR::Dist::FromCPAN's cpan2par tool to create a Math-Pari-VERSION-PLATFORM-PERLVERSION.par binary from Math::Pari
      • Ship that to the clients
      • Have them install that binary with perl -MPAR::Dist -einstall_par (assuming only one .par in the current directory. See PAR::Dist.)
      You can even embed PAR::Dist into a simple "install_par.pl" script since it's pure-perl without non-core dependencies.

      About your suggestion for the FAQ: Would you mind editing the wiki yourself? Just modify the existing answer or, if you like, create a new Q/A pair. Thanks!

      Steffen

        Yes, IIRC none of the running instances of my program had very large Shared parts in their memory usage. I won't be at this particular $client's till Wednesday but I'll double check then. (Thanks for the wiki tip; I will certainly update the FAQ there if my hypothesis is confirmed.)

        The system is CentOS, but that doesn't have RPMs for any of the PAR utils (including PAR::Dist). Since PAR is meant to help with the bootstrapping problem of weakling perl installs, it'd be awesome if all major distros got one -- I'll see if what I can do in a couple of days :-)

      Given that, it may also be worth posting a separate SOPW question about what you're doing that requires Math::Pari. There may be other ways to do it that would involve modules without nearly as much baggage.
      --
      @/=map{[/./g]}qw/.h_nJ Xapou cets krht ele_ r_ra/; map{y/X_/\n /;print}map{pop@$_}@/for@/
        I use Net::SSH::Perl, which relies on Crypt::DSA, which in turn uses Math::Pari. This last isn't strictly required... so long as you don't mind 30-second-long login times to SSH servers. :-(

        Regardless, the RAM overhead of PAR seems to be about 8-9MB...

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://593494]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others meditating upon the Monastery: (11)
As of 2014-04-23 09:59 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    April first is:







    Results (541 votes), past polls