Beefy Boxes and Bandwidth Generously Provided by pair Networks
more useful options
 
PerlMonks  

comment on

( [id://3333]=superdoc: print w/replies, xml ) Need Help??
The ugly details of how all this happens should be of no concern for your everyday programmer. The detail of the instance storage (it actually is a blessed HASH) should be of no concern to your everyday programmer either. In short, it should all Just Work.

Such bold statements are fine if all your applications

  1. Run for a couple of seconds or minutes at most.
  2. Create a few dozen or few hundred largish objects at most.
  3. If it is convenient, easy and economic to throw hardware at them to alleviate memory and performance bottlenecks.

But if your application does not fit into this mode of operation; is long running, uses hundreds of thousands or millions of small objects; by it's very nature, pushes the boundaries of both memory and performance of retail hardware to simply hold it's data before you add the overhead of the OO implementation; requires algorithms that need to constant access the entire range of that data with multiple passes; and does not lend itself to being spread across clustered or networked solutions.

For these types of applications, the mechanisms of OO implementation and the the memory and performance overheads they incur are of considerable concern.

By way of example. Many of the problems of bio-genetics involve taking millions of fragments of DNA, totalling 1 or two GB of data, and attempting to match their ends and so piece together the original sequence. Just loading the raw data starts to push the boundaries of retail hardware to it's limits.

Exhaustively iterating all the subsequences of each and comparing them against each other requires an in-memory solution, as splitting the processing across multiple hardware is complex and hugely costly in terms of the communications overhead. Such exhaustive explorations often run for days or even weeks. Ignoring the costs of objectization, or looking at RDBMS solutions to alleviate memory and/or communications concerns can extend those time periods to months.

The notion of using genetic programming techniques, applying the power of statistics and intelligently random mutation and generational algorithms, as a replacement for the exhaustive, iterative searches is an attractive one. Genetic algorithms can produce impressive results in very short time periods for other NP hard, iterative problems--travelling salesman; knapsack problem etc.

The idea of "throwing the subsequences into a cauldron" and letting them talk to each other to find affinities, in a random fashion, scoring the individual and overall matches achieved and then "stirring the pot" and letting it happen over lends itself to making each subsequence an object. If each subsequence of a few 10s of bytes of data is going to be represented by a blessed hash, with it minimum overhead of approx. 300 bytes, then you're only going to fit around 7 million in the average PCs memory. If you instead store the sequences in a single array, and use a blessed, integer scalar as the object representing them, the per object overhead of the OO representation falls to approx. 56 bytes giving you room for something like 38 million.

Will that saving be enough to allow the algorithm to run in memory? For some yes, for others no, but the beauty of Perl's "manual" OO, is that it gives you the choice to balance the needs of your application against the doctrines of OO purity.

Most OO languages do not give you that choice and so they "Just work", until they don't. And then you're dead in the water facing the purchase of expensive hardware that can handle more memory (and the memory to populate it), or making your program an order of magnitude more complex and several orders of magnitude slower by using a clustered or networked solution.

Perl gives you the possibility to address problems at their source, by modifying the choices you make in your own source code. And you can do it today without having to wait 3 months for the Hardware Acquisitions committee to approve your Capital Expenditure Request, or the Finance dept. to get the budget; or go through the Corporate Software Approvals process to lay your hands on the clustering software you need :)

Blessed array indexes as object handles, and direct access to instance data may not be pc as far as OO doctrine is concerned, but a blessed scalar is a blessed scalar regardless of whether it points to a hash that holds a key that indexes into a table that points to the data; or is just a direct reference to the data. And whilst getters and setters to access instance data may prove useful in isolating applications from implementation details, for library classes that will have a long life and are likely to be refactored. For many, perhaps most applications, the level of refactoring that would benefit from that will never happen, and the benefits of that isolation will never be realised.


Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
Lingua non convalesco, consenesco et abolesco. -- Rule 1 has a caveat! -- Who broke the cabal?
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.

In reply to Re^2: Your favorite objects NOT of the hashref phylum by BrowserUk
in thread Your favorite objects NOT of the hashref phylum by blogical

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post; it's "PerlMonks-approved HTML":



  • Are you posting in the right place? Check out Where do I post X? to know for sure.
  • Posts may use any of the Perl Monks Approved HTML tags. Currently these include the following:
    <code> <a> <b> <big> <blockquote> <br /> <dd> <dl> <dt> <em> <font> <h1> <h2> <h3> <h4> <h5> <h6> <hr /> <i> <li> <nbsp> <ol> <p> <small> <strike> <strong> <sub> <sup> <table> <td> <th> <tr> <tt> <u> <ul>
  • Snippets of code should be wrapped in <code> tags not <pre> tags. In fact, <pre> tags should generally be avoided. If they must be used, extreme care should be taken to ensure that their contents do not have long lines (<70 chars), in order to prevent horizontal scrolling (and possible janitor intervention).
  • Want more info? How to link or How to display code and escape characters are good places to start.
Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others imbibing at the Monastery: (3)
As of 2024-04-24 19:00 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found