Beefy Boxes and Bandwidth Generously Provided by pair Networks
Come for the quick hacks, stay for the epiphanies.
 
PerlMonks  

Re: The costs of packages

by perrin (Chancellor)
on Sep 16, 2003 at 05:10 UTC ( [id://291722]=note: print w/replies, xml ) Need Help??


in reply to The costs of packages

I don't know exactly what you're trying to do, but I'll bet you there's a way to do it with a hash and a single package that would be much more efficient.

Replies are listed 'Best First'.
Re: Re: The costs of packages
by BrowserUk (Patriarch) on Sep 16, 2003 at 06:03 UTC

    What I am trying to succeeding in doing, is encapsulate several hundred datatypes (think C-style structs and unions) in such a way that they can be imported into a program on demand without requiring mass pre-declaration as would be the case with

    use My::Types qw[typeA typeB typeZZY];

    or

    use My::Type::typeA; use My::Type::typeB; use My::Type::typeC;

    or

    use My::Types::Factory; my $typeA = My::Type::Factory->new( 'TypeA' ); my $typeB = My::Type::Factory->new( 'TypeB' ); my $typeZZY = My::Type::Factory->new( 'TypeZZY' ); my $varTypeA = $typeA->new(); my $varTypeB = $typeB->new(); my $varTypeZZY = $type->new();

    I also don't wish to have every program carry the weight of several hundred unused types because it uses 1 of them, hence the need to use the autoload. My inspiration comes from the POSIX package.

    I need the types to each be in a seperate package space because each type will have the same set of methods.

    So, whilst I am quite happy with the technique I outlined from the use and implementation point of view, I am a little concerned that the first variation my consume more "glob space" than is necessary, and if the second variation would save any substantial amount of memory and/ or prevent or reduce any performance hit that (may or may not) result from the first, then I would accept the slightly more verbose syntax of the second over the first.

    However, if there is little or no difference in the overheads of the two variations, then I will go with the former's reduce syntax.

    Now, if you can show me an efficient way of doing this with "a hash and a single package" I'm all ears?


    Examine what is said, not who speaks.
    "Efficiency is intelligent laziness." -David Dunham
    "When I'm working on a problem, I never think about beauty. I think only how to solve the problem. But when I have finished, if the solution is not beautiful, I know it is wrong." -Richard Buckminster Fuller
    If I understand your problem, I can solve it! Of course, the same can be said for you.

      Actually, the factory approach does not require mass pre-declaration and doesn't need to load anything that you don't use, so it's the best approach if you want to stick with the multiple packages approach. The factory method here would figure out the package name dynamically and then do a require for it, instantiate it, and return it.

      I still think you can do it without multiple packages though, if they all share the same methods. You only need multiple packages if every package has different methods.

        Could you put a little flesh on those bones for me please? A brief example of the factory package and how to use it would be good.


        Examine what is said, not who speaks.
        "Efficiency is intelligent laziness." -David Dunham
        "When I'm working on a problem, I never think about beauty. I think only how to solve the problem. But when I have finished, if the solution is not beautiful, I know it is wrong." -Richard Buckminster Fuller
        If I understand your problem, I can solve it! Of course, the same can be said for you.

      So, whilst I am quite happy with the technique I outlined from the use and implementation point of view, I am a little concerned that the first variation my consume more "glob space" than is necessary, and if the second variation would save any substantial amount of memory and/ or prevent or reduce any performance hit that (may or may not) result from the first, then I would accept the slightly more verbose syntax of the second over the first.

      I dont think that you need to worry about this. Certainly not as long as you are in the hundreds and not hundreds of thousands. Incidentally worrying about this smells like premature optimization to me... :-) Are there actually symptoms of a problem or are you just considering best practice?

      A last point regarding AUTOLOAD, you may find the biggest drawback of this approach is speed. I have heard it said that AUTOLOADING a sub spoils the method cache (I dont know if thats only for one package or for all or what,) so if speed is an issue you might look at that.


      ---
      demerphq

      <Elian> And I do take a kind of perverse pleasure in having an OO assembly language...

        With the scheme I outlined, the packages would only come into existance when they are used, generated (eval'd) by the one small AUTOLOAD routine that gets loaded by the original use statement with the detail of each individual type being created based on information from a lookup table (hash) filled from the __DATA__ file.

        The information in the DATA section will itself be autogenerated from several other very large files (think OS C header files) and reduced to the minimum info I require.

        My only reason for creating the minimalist packages for each type were

        a) To route the initial calls to new AsYetUndefinedType; back to the single Superclass AUTOLOAD generator.

        b) To ensure that only known types would be auto-generated. Unknown types, typos, etc. wouldn't have a stub package in existance, so they would cause the usual

        Can't locate object method "new" via package "XTYPE" (perhaps you forg +ot to load "XTYPE"?)

        Abigail's neat use if the UNIVERSAL package (which I was only vaguely aware of) neatly side steps the whole issue by removing the need for the hundreds of minimal (stub) packages whilst retaining the simplicity of syntax and avoiding the need for predeclaration. The only caveat I can see is that it places the burden of verifying that the types being autovivified are known types on my code rather than the compiler. Ie. run-time verification, rather than compile-time, but that is a small price to pay for the benefits it produces.

        The original question was just trying to clarify what performance impact having hundreds of unused minimal packages floating around would have, and whether any such impact would be reduced by moving those minimal packages below a single, top level namespace, rather than each existing in the top-level namespace.

        Every time you call a subroutine, perl has to lookup the name of the sub to find the address to transfer control to. Remembering that goto &subname; is not itself the quickest code on the block, I was concerned that maybe adding large numbers of unused names to the top-level namespace would have a detrimental performance impact on the whole program and wondered (aloud) if this might be alleviated by keeping most of them in a different namespace.

        Each package also uses memory. It's easy to demonstrate that creating 10,000 hashes with 1 key/value pair each requires (on my machine with 5.8.1) around 5 MB of memory, whilst creating 1 hash with 10,000 key/value pairs only requires 1 MB of ram. From my inspections with the debugger, it would appear that there are several hashes, each containing several nested hashes created for each package one loads. Each hash appears to contain very little, but as shown above, each hash carries an overhead.

        If I can reduce the overall memory consumption by nesting the generated packages one level down in the top level hash by putting many keys/value pairs within a single hash rather than creating lots of small hashes, I consider this to be worthwhile doing. If that has the knock-on effect of reducing the namespace pollution at the top level, and with it, the lookup time for every every function call in the program, then I consider it even more worthwhile.

        However, from my inspections and experiements, it is not at all clear to me whether my concerns are real or not. Hence my questions and the hope that someone browsing by with a better understanding of these things might be able to clarify them for me.

        I never consider it premature to think about the overall impact of my designs upon the programs that will use them. I beleive it to be good design to do so. Only by thinking about such things do you avoid the 'bloat' syndrome that we all love to hate in Word and other similar programs.

        Abigail's neat solution with the UNIVERSAL package bypasses these concerns and results in a solution that from my perspective has almost no downsides. Having this brought to my attention completely vindicates both my having thought about the impact and asking the question.


        Examine what is said, not who speaks.
        "Efficiency is intelligent laziness." -David Dunham
        "When I'm working on a problem, I never think about beauty. I think only how to solve the problem. But when I have finished, if the solution is not beautiful, I know it is wrong." -Richard Buckminster Fuller
        If I understand your problem, I can solve it! Of course, the same can be said for you.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://291722]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others musing on the Monastery: (2)
As of 2024-04-26 06:00 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found