Beefy Boxes and Bandwidth Generously Provided by pair Networks
Problems? Is your data what you think it is?
 
PerlMonks  

Minimizing PAR performance hit

by gaal (Parson)
on Jan 07, 2007 at 10:16 UTC ( #593384=perlquestion: print w/ replies, xml ) Need Help??
gaal has asked for the wisdom of the Perl Monks concerning the following question:

I'm seeking practical ways of reducing the runtime hit (in CPU and especially RAM) of using PAR.

I have a program with dependencies on various modules such as Net::SSH::Perl, Net::SNMP, Net::SSLeay, and XML::Twig. When I run it without PAR in a certain refrence mode, it takes about 10 seconds to complete (some of the time is spent waiting for the network end) and 15MB resident RAM at peak. Let's say I can live with that.

Under full PAR (that is, pp and a self-contained executable), the load time (once the cache is warm) is increased by about one second, and the memory consumption jumps to 22MB. The program on disk is about 7.7MB.

Since I can assume perl itself is installed on my deployment machine, I tried using pp -P, which produces a script (not bundling perl). Now my program on disk is about 5.6MB, the startup hit is only ~0.5 sec, and peak RAM is about 21MB, all in all not a great improvement.

Supposing I can assume perl and PAR, and maybe even strategical additional Perl modules but not all are available on my deployment machine, can you suggest a better bundling method that can decrease the hit? The half-second is okay, but the 6MB is probably not.

The system is CentOS 4.4, running Linux 2.6.9.

Update: --dependent didn't seem to help at all? Is it implied by -P?

Comment on Minimizing PAR performance hit
Select or Download Code
Re: Minimizing PAR performance hit
by jettero (Monsignor) on Jan 07, 2007 at 12:52 UTC

    The fastest way to run perl programs is to not make them into an .exe first. You might also look at B::C. I believe the .c files and the .exes output from them will be a lot faster than PAR binaries, but they may not work as well.

    Basically, the problem is that perl just wasn't designed to be packed up like this. It can be made to work, but you have to suffer through unpacking an entire perl distribution to your tempfiles dir. And even when it's warm (as you say), you still have a 15meg binary to go through looking for the 1k of perl code to execute.

    I happen to use and enjoy PAR in production environments, but I have come to accept certain performance penalties. The only real advantage is not having to install perl or the related packages. Which can be an enormous advantage in certain specialized settings.

    UPDATE: I overstruck B::C because when it does work it's a fluke. It's a disaster and diotalevi is right to object. I didn't realize at the time that it wasn't even being maintained anymore. Curiously, I just read somewhere it might get resurrected in 5.10?

    -Paul

      The B::C modules are dead.

      ⠤⠤ ⠙⠊⠕⠞⠁⠇⠑⠧⠊

Re: Minimizing PAR performance hit
by tsee (Curate) on Jan 08, 2007 at 09:35 UTC

    If I recall correctly --dependent is implied by -P. The reason is somewhat obvious (to me): If you need a Perl interpreter to execute the produced script (for those following along: -P produces a Perl script with all dependencies embedded), it's just pointless to ship a libperl along.

    One thing I'd like to point out which I am sure you already know, but some other readers might not, is that the PAR "runtime" performance hit is related to the start-up time only. That is, it takes longer to start, but the performance once its running is identical. False assumptions about this have cropped up in the past.

    If you produce a stand-alone .exe using PAR, that .exe will have to do some fancy bootstrapping procedure and ends up creating two processes instead of one. Reason is that it cannot rely on a single (perl) process to do the cleanup up of the temporary data if that was so requested by the packager. After all, the "inner" process is started from the temporary directory itself. I would say this constitutes a good part of the memory overhead.

    Validating this, I tried loading the PAR module from a one-liner. This shows an increase of RAM use of about 1.5MB over a one-liner that consists of only "use strict; use warnings;" or about 2MB over one without any "use" statements. This difference between a script that uses PAR and one which doesn't might be further reduced if it uses more of the modules PAR loads internally anyway. Numbers at the end of this post.

    The way to go to reduce the memory overhead is certainly to move away from stand-alone executables. Install a perl. Install PAR on the client-side. This is a pure-Perl module since version 0.970 and should install cleanly just about anywhere. Then ship a .par archive to the client. Any script that was packaged indo that .par can be executed with a one-liner:

    perl -e 'use PAR {file => "foo.par", run => "myscript.pl"};'

    This should incrementally unpack the necessary files from the .par archive instead of dumping the whole archive's contents to disk. (Don't quote me on this, though.)

    Creating a .par archive is reasonably simple: Use the -p (without the -B) option to pp. Since pp (or rather Module::ScanDeps) is a little over-aggressive at including possible dependencies, you might want to unzip the .par into a temporary directory, remove anything that you are sure won't be needed, and then re-zip it. PAR::Dist can help with that a little if this feels uncomfortable.

    Of course, unpacking stuff incrementally will be slower over all than dumping it all to disk at start-up. I don't know how much difference this makes, but at least you don't pay the penalty from the separate process, etc.

    Finally, I would suggest looking at the PAR FAQ at par.perl.org. It has some good tips for reducing the executable size which might provide some benefit in this case, too.

    Hope this helps,
    Steffen

    P.S.: The memory numbers as top showed:

    3692 # -MPAR (0.970) 3392 # -MPAR (0.92) 2128 # -e "use strict; use warning;" 1656 # -e1
    There was quite a bit of code added to PAR between 0.92 and 0.970, but 300kb of extra memory kind of surprises me.

      Thanks for the input.

      One thing I have since noticed, sorting included files by size, is that Math::Pari comes with a huge 6MB shared object! This is bad enough, though not of course PAR's fault; but it is made worse by the fact that I have several instances of my program running and apparently an .so embedded this way cannot be shared. So the effective difference between a PARed and unPARed version of my program is dramatic indeed.

      In my case, I might be able to get Math::Pari deployed separately. I think adding "look for large shared objects" can go in the FAQ, which is indeed otherwise useful.

        Wow, 6MB is quite a shared object!

        By the way, what do you mean by "cannot be shared"? Can't be shared in the concept of shared memory? Does that work if the instances run in separate perl's? If so, I wonder why this doesn't happen if you run the same par'd binary twice because the cache area should be the same. Does it, perhaps, help to put Math::Pari into a separate .par and using that? Try fiddling with the $ENV{PAR_GLOBAL_TEMP} variable to force sharing of the cache directory. And please forgive me for hand-waving. :)

        On a related note: If you download the PAR-0.970.tar.gz distribution and look for the pare utility in the contrib/ subdirectory, you'll find a tool to remove all common modules from one executable and make it depend on the other. I'm not sure it works for the dependency on a .par. But perhaps pp -X foo.par works by skipping the stuff in foo.par during packaging!

        If you plan to deploy Math::Pari to the target system(s), you can use a .par as well - if you like. Here, I would advise a slightly different process:

        • use PAR::Dist::FromCPAN's cpan2par tool to create a Math-Pari-VERSION-PLATFORM-PERLVERSION.par binary from Math::Pari
        • Ship that to the clients
        • Have them install that binary with perl -MPAR::Dist -einstall_par (assuming only one .par in the current directory. See PAR::Dist.)
        You can even embed PAR::Dist into a simple "install_par.pl" script since it's pure-perl without non-core dependencies.

        About your suggestion for the FAQ: Would you mind editing the wiki yourself? Just modify the existing answer or, if you like, create a new Q/A pair. Thanks!

        Steffen

        Given that, it may also be worth posting a separate SOPW question about what you're doing that requires Math::Pari. There may be other ways to do it that would involve modules without nearly as much baggage.
        --
        @/=map{[/./g]}qw/.h_nJ Xapou cets krht ele_ r_ra/; map{y/X_/\n /;print}map{pop@$_}@/for@/

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://593384]
Approved by BrowserUk
Front-paged by ysth
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others chilling in the Monastery: (9)
As of 2014-07-23 06:02 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    My favorite superfluous repetitious redundant duplicative phrase is:









    Results (133 votes), past polls