http://www.perlmonks.org?node_id=539018


in reply to Your favorite objects NOT of the hashref phylum

Personally, I think one of the reasons Perl 5 OO has such a bad name is that you have to make such "manual" descisions yourself. In most other OO languages, these kind of descisions are made for you. This is excactly why I wrote Moose.

package Stopwatch; use strict; use warnings; use Moose; use DateTime; has 'timer' => ( is => 'rw', isa => 'DateTime', default => sub { DateTime->now } ); sub diff { my $self = shift; $self->time - DateTime->now; } # ... my $sw = Stopwatch->new; # ... wait a moment print $sw->diff;
In this example, the a default constructor is already available for you (from Moose::Object), instance attributes are managed by the underlying metaclass (as they are in most all other OO languages) which means that and an accessor for the "time" slot has been created (which is read/write, will only accept "DateTime" objects, and has a default value of "DateTime->now").

The ugly details of how all this happens should be of no concern for your everyday programmer. The detail of the instance storage (it actually is a blessed HASH) should be of no conern to your everyday programmer either. In short, it should all Just Work.

-stvn

Replies are listed 'Best First'.
Re^2: Your favorite objects NOT of the hashref phylum
by BrowserUk (Patriarch) on Mar 24, 2006 at 21:48 UTC
    The ugly details of how all this happens should be of no concern for your everyday programmer. The detail of the instance storage (it actually is a blessed HASH) should be of no concern to your everyday programmer either. In short, it should all Just Work.

    Such bold statements are fine if all your applications

    1. Run for a couple of seconds or minutes at most.
    2. Create a few dozen or few hundred largish objects at most.
    3. If it is convenient, easy and economic to throw hardware at them to alleviate memory and performance bottlenecks.

    But if your application does not fit into this mode of operation; is long running, uses hundreds of thousands or millions of small objects; by it's very nature, pushes the boundaries of both memory and performance of retail hardware to simply hold it's data before you add the overhead of the OO implementation; requires algorithms that need to constant access the entire range of that data with multiple passes; and does not lend itself to being spread across clustered or networked solutions.

    For these types of applications, the mechanisms of OO implementation and the the memory and performance overheads they incur are of considerable concern.

    By way of example. Many of the problems of bio-genetics involve taking millions of fragments of DNA, totalling 1 or two GB of data, and attempting to match their ends and so piece together the original sequence. Just loading the raw data starts to push the boundaries of retail hardware to it's limits.

    Exhaustively iterating all the subsequences of each and comparing them against each other requires an in-memory solution, as splitting the processing across multiple hardware is complex and hugely costly in terms of the communications overhead. Such exhaustive explorations often run for days or even weeks. Ignoring the costs of objectization, or looking at RDBMS solutions to alleviate memory and/or communications concerns can extend those time periods to months.

    The notion of using genetic programming techniques, applying the power of statistics and intelligently random mutation and generational algorithms, as a replacement for the exhaustive, iterative searches is an attractive one. Genetic algorithms can produce impressive results in very short time periods for other NP hard, iterative problems--travelling salesman; knapsack problem etc.

    The idea of "throwing the subsequences into a cauldron" and letting them talk to each other to find affinities, in a random fashion, scoring the individual and overall matches achieved and then "stirring the pot" and letting it happen over lends itself to making each subsequence an object. If each subsequence of a few 10s of bytes of data is going to be represented by a blessed hash, with it minimum overhead of approx. 300 bytes, then you're only going to fit around 7 million in the average PCs memory. If you instead store the sequences in a single array, and use a blessed, integer scalar as the object representing them, the per object overhead of the OO representation falls to approx. 56 bytes giving you room for something like 38 million.

    Will that saving be enough to allow the algorithm to run in memory? For some yes, for others no, but the beauty of Perl's "manual" OO, is that it gives you the choice to balance the needs of your application against the doctrines of OO purity.

    Most OO languages do not give you that choice and so they "Just work", until they don't. And then you're dead in the water facing the purchase of expensive hardware that can handle more memory (and the memory to populate it), or making your program an order of magnitude more complex and several orders of magnitude slower by using a clustered or networked solution.

    Perl gives you the possibility to address problems at their source, by modifying the choices you make in your own source code. And you can do it today without having to wait 3 months for the Hardware Acquisitions committee to approve your Capital Expenditure Request, or the Finance dept. to get the budget; or go through the Corporate Software Approvals process to lay your hands on the clustering software you need :)

    Blessed array indexes as object handles, and direct access to instance data may not be pc as far as OO doctrine is concerned, but a blessed scalar is a blessed scalar regardless of whether it points to a hash that holds a key that indexes into a table that points to the data; or is just a direct reference to the data. And whilst getters and setters to access instance data may prove useful in isolating applications from implementation details, for library classes that will have a long life and are likely to be refactored. For many, perhaps most applications, the level of refactoring that would benefit from that will never happen, and the benefits of that isolation will never be realised.


    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    Lingua non convalesco, consenesco et abolesco. -- Rule 1 has a caveat! -- Who broke the cabal?
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.
      BrowserUk

      Well, all your points are well taken. Obviously you should always use the right tool for the job. But I am actually not sure that Moose is the wrong tool for you. Because Moose uses metaclasses to build all instances, and those same metaclasses also build the accessors, there is great opportunity for optimizations here.

      As I mentioned, Moose uses Class::MOP, which builds blessed HASH based instances, but this is just the default. It is possible to extend Class::MOP to build other kinds of instances as well. In the examples for Class::MOP, I show how it can be used to build inside-out classes. I also have an example of a Lazy class which will not initialize it's fields until the absolute last possible moment. I have been considering an ARRAY based example as well, but haven't gotten around to it. There is also nothing to stop you from an Inline::C based version which possibly could be made even more space/time efficient than an array version.

      As for how all this will work with Moose, allow me to explan. The primary role of Moose is to collect meta-data about your class definition, and create a number of metaobjects from it. (The extra overhead of these metaobjects will usually be fairly small, since they are a per-class cost, and not a per-object/instance cost, and most systems will have a reasonably small amount of classes compared to the amount of object/instances they generate. But I digress here ...). These Moose metaobjects are used to build things like accessors, and eventually to help build the actual instances too. Since Moose only makes you pay for the features you use, if, for instance, you don't choose to use a type constraint, you don't pay for it's overhead. Now, since all the details of your class is stored as meta-data by Moose, and the Moose metaobjects are managing the details of your object instance structure and means of accessing it (through accessors), it is possible to swap a different Class::MOP based engine into Moose and have it create ARRAY based instances without having to change the surface syntax of your Moose based classes (assuming you don't break encapsulation that is).

      Now, the example I describe is not easily accomplished at this point, because I have not added the proper hooks into Moose for his kind of thing. But Class::MOP has been designed to do this kind of thing from the very start, so its just a matter of getting the tuits to do it.

      Moose and Class::MOP are tools designed to work with Perl 5's OO system and not against it. This means that they should not get in you way if you need/want to do something different, because after all, TIMTOWTDI :)

      -stvn
      Thanks a lot BrowserUK, that's exactly the sort of answer I was looking for! Very well put.

      obey the law
      <flameRetardent>
      First let me say that I think that all the ways of making objects discussed here are good choices in many situations.
      </flameRetardent>

      I just wanted to make a comment that many systems do just deal with a relatively small number of objects at once - most web based systems for example.

      I once used an in-house object system similar to what Moose sounds like, but with even more features and overhead (it also included automatic RDBMS mapping etc). It worked fine in the web based system, but then we had to make some batch jobs using the same objects for migration. We had a few hours to build, modify and then tear down hundreds of thousands of objects.

      The system was too slow - it was going to take about 2 days! But wait - theres more! I spent a few days profiling the code and came up with a few smallish changes. I tweaked the most used methods to use more efficient mechanisms. Some methods were being run literally millions of times - in those cases I threw niceness to the wind and removed lexically scoped variables, reached into objects bypassing accessors etc. The were mostly small methods and I made up for the ugliness with large wads of code comments and POD to ensure that the code remained maintainable.

      2 days of execution then became 2 hours. I also did some major tweaking on the RDBMS side, but at least half of the performance gain was due to the perl code changes.

      My point is that you should normally not throw out a code model that benefits your developers because of concerns with future scalability. Unless the model is stupid there is usually a way to make it fast after the fact. This is not always true in other languages where you are contstrained in your options, but in Perl there is always a way to optimise more. If you really need to, you can do wacky things like manipulate the runtime tables or rewrite your most often used methods in XS, but I've never had to do that (which is a pity because it could be fun).

        First let me say that I think that all the ways of making objects discussed here are good choices in many situations.

        No need for the flame retardant, I completely agree with you. If you look again at the post, I was responding to the "bold statements" I quoted only.


        If I have an application that will benefit from OO, I use OO as I suggested in my example above.

        If I have an application that needs a blessed hash, I'll use a blessed hash. Or a blessed, array or a blessed scalar, or a blessed glob.

        If I write a module that I think might be usefully sub-classed, and especially if I think that it might be be useful to others via cpan, I'd probably opt for the former, simply because it's what most people are used to and would be least likely to cause surprises.

        But I do not feel obliged to make all modules OO, just be cause OO is cool, and I certainly don't feel obliged to wrap an OO facade around those parts of my code that are fundamentally not OO, just to satisfy the dogma of OO purism.

        For example, the 'singleton pattern' is a farce. It is a dogmatic wrapper to conceal a global variable. It is used because in the dogmatic world of OO purity, globals are not OO, therefore globals are bad.

        IMO, to use a quaint old phrase my grandmother would resort to on the rare occasions that something really made her angry--that is just so much stuff and nonsense.

        OO is a tool--not a philosophy, way of life, or mandatory way of programming. And like any other tool, you should use it when it benefits your application, and not when it doesn't.


        Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
        Lingua non convalesco, consenesco et abolesco. -- Rule 1 has a caveat! -- Who broke the cabal?
        "Science is about questioning the status quo. Questioning authority".
        In the absence of evidence, opinion is indistinguishable from prejudice.
        Now whose being dogmatic ;)

        The singleton pattern is often used in that way by those who have not been taught better (or who are simply lacking in mental horsepower).

        Singleton objects (or similar) can be useful however. Because they hide the fact that there is only one, it can be changed later to have more than one. Say for example your program logs to a logfile, so you use a singleton object factory to return a thin class with a file handle and a print_to_log method. Later you want to use a different logfile depending on the name of the method (or whatever) - the object factory can be changed to return you a different logger object based on your criteria. You're still caching file handles, just a number of them instead of only one. If you used a global variable you would have to change every point where you print to that filehandle to achieve the same effect.

Re^2: Your favorite objects NOT of the hashref phylum
by xdg (Monsignor) on Mar 24, 2006 at 18:37 UTC

    With all due respect -- as I'm sure that Moose is techinically excellent -- I don't really understand how it is really any different from a practical perspective than most of the other full-featured class system generators for Perl.

    You define a syntax for users to create constructors, accessors, etc. You've got some clean syntax and some stronger type-checking than some class builders, true, but the basic concept of providing an interface to generate a class hasn't changed a lot since, say, Class::MethodMaker (which goes back to 1996).

    With almost any full system, you get the accessors and constructors 'for free' and don't need to know the details. For example:

    package Stopwatch; use strict; use warnings; use DateTime; use Class::MethodMaker [ scalar => [ { -type => 'DateTime', -default_ctor => sub { DateTime-> +now }, 'timer' ], new => 'new', ]; sub diff { my $self = shift; $self->timer - DateTime->now; } # ... my $sw = Stopwatch->new; # ... wait a moment print $sw->diff;

    Only when you move away from using accessors to get at encapsulated data does the underlying form really matter. For example, with inside-out objects:

    package Stopwatch; use strict; use warnings; use DateTime; use Class::InsideOut qw( public register id ); public timer => my %timer => { set_hook => { $_->isa('DateTime') or die "must be a DateTime object" + } }; sub new { my $self = register( bless \(my $s) ); $timer{ id $self } = DateTime->now; return $self; } sub diff { my $self = shift; $time{ id $self } - DateTime->now; } # ... my $sw = Stopwatch->new; # ... wait a moment print $sw->diff;

    Could you help me understand what's really different about Moose beyond syntax?

    -xdg

    Code written by xdg and posted on PerlMonks is public domain. It is provided as is with no warranties, express or implied, of any kind. Posted code may not have been tested. Use of posted code is at your own risk.

      Could you help me understand what's really different about Moose beyond syntax?

      Moose is built on top of Class::MOP (a metaclass system for Perl 5), which means that Moose has support for deep class introspection. So for instance, you can do this:

      my $type_constraint = Stopwatch->meta->get_attribute('timer')->type_co +nstraint; print $type_constraint->name; # prints DateTime
      And since the $type_constraint returned is itself a meta-object, you can get plenty of information from it as well, such as it's parent types (NOTE: this is not reflective of any class/@ISA hierarchy in Moose, it is instead just type/subtype relationships defined in Moose::Util::TypeConstraints)
      print $type_constraint->parent->name; # prints Object print $type_constraint->parent->parent->name; # prints Ref print $type_constraint->parent->parent->parent->name; # prints Any
      While type information like this is not directly useful for your everyday programmer, it is very useful for tools like ORMs. I am actually working closely with mst, who is the author of DBIx::Class, to explore the possibilities of using this deep introspection from Moose with DBIx::Class somehow.

      But Moose has many other features as well, not just automatic accessor generation. It also can create arbitrary type-constraints, which are not directly linked to classes in the system. Here is an example from the Moose test suite:

      subtype Protocol => as Str => where { /^HTTP\/[0-9]\.[0-9]$/ };
      This is a subtype of the built-in Moose type 'Str' which will only accepts strings which look like HTTP protocol declarations. And you can use these in your attribute declarations just the same as class names, so this will DWIM.

      has 'protocol' => (isa => 'Protocol');

      Because Str and Protocol are just a type "constraints", we avoid the unessecary overhead of creating a class for this. Moose also has type-coercion features as well, although those are still fairly experimental, if you interested I suggest looking here to see a good example of it's usage.

      Moose also offers CLOS style before/after/around method modifiers, which allow for AOP like method wrapping. An example of that can be found here in a recent use.perl posting of mine. And version 0.03 of Moose (due out later this week), will have support for BETA style "inner" calls (think of these as the inverse of "super" calls, /msg me if you want more details).

      And to top this all off, Moose will be (although it not currently, it's only 0.02 after all) completely pluggable. Meaning you will be able to (for instance) subclass Moose::Meta::Attribute to either change existing behaviors, or add new behaviors to change how Moose handles attributes. This is all due to the fact that Moose is built on Class::MOP, which is built on the theoretical foundations of other (somewhat more acedemic) object systems like CLOS and Smalltalk (which themselves were built by people far smarter than I will ever be).

      In short, Moose is just the syntactic sugar on top of a fully dynamic and reflective object system for Perl 5.

      -stvn
Re^2: Your favorite objects NOT of the hashref phylum
by blogical (Pilgrim) on Mar 24, 2006 at 16:56 UTC
    A moose once bit my sister...