Beefy Boxes and Bandwidth Generously Provided by pair Networks
There's more than one way to do things
 
PerlMonks  

A First CPAN Odyssey

by skyknight (Hermit)
on Jun 22, 2004 at 15:16 UTC ( #368760=perlmeditation: print w/ replies, xml ) Need Help??

At my job I'm currently working on a module for abstracting objects and relationships in a SQL database through (what I deem to be) a very clean and generalized object oriented interface. It is (I hope) DBD agnostic, using the DBI and allowing you to plug any DBD into it fairly seamlessly by sub-classing a configuration parser file. At present, I'm pretty happy with the organization of the class hierarchy and interactions, have written fairly extensive implementation and API documentation, and have automated tests that employ the Test::Unit framework. I have gotten the go ahead from my boss to release the module to CPAN, and will probably do so in a few weeks. However, I am presently concerned about "when in Rome" issues, and want very much for my module to conform to the various CPAN mores.

The first thing I'm wondering about is name spacing issues. Presently, the top level package name is just SQL, so I have sub-packages such as SQL::Object, SQL::Link, SQL::Statement, SQL::Object::ResultSet, SQL::Link::ResultSet, etc. I realize that this probably isn't a very good name, but I'm not sure what would be. The SQL name is nice and short, but not really all that descriptive of what the module is doing. What would be a sensible name? What is the best mechanism via which to receive feedback on naming issues?

With regards to testing, I'm fairly confident in my ability to write good tests as far as verifying correctness is concerned, but I also want to adhere to the various conventions for CPAN modules. Right now I am using the Test::Unit framework, and for each class Foo::Bar, I have a corresponding Foo::BarTest which is a subclass of Test::Unit::TestCase. Furthermore, in each directory of the library hierarchy, I have a package of the form Foo::TestSuite which subclasses Test::Unit::TestSuite, and contains all of the tests in that directory, as well as any TestSuite subclasses in subdirectories. This gives me a nice tree structure of tests that allows me both to focus on particular pieces of the code tree, while also allowing me to run all of the tests from a single command line invocation of the top level TestSuite. I realize, however, that while this is great for me, it might not meet the expectations of others... How much leeway do I have in this regard? Is it bad that I'm using Test::Unit which does not come standard with Perl? Should I be using something more mainstream and better maintained like Test::More? If I do that, am I going to have to scrap my current testing framework layout?

On a related thread, I'm curious as to how I should go about testing interactions with a database on a user's system. It would be very rude for the testing procedures of my module to carelessly stomp on the database residing on someone's machine. What precautions and paradigms should I employ so as both to thoroughly test my module on the user's machine and avoid trashing his environment? I've read horror stories of hapless users having rogue modules smash their data, and I'd rather not be a purveyor of such a module.

With regards to writing documentation, what are some tips for adhering to the Principle of Least Astonishment? I'm trying my best to avoid making the kind of assumptions that developers tend to make about code that they have written that results in unreadable documentation. I'll just have to hope that the world agrees. There are, however, more general issues... Is there any kind of de facto standard for laying out documentation? I've been writing POD in each library file, at the end, not mixing POD and code because I think that makes for hideously ugly source files. I have been using the following four sections: Name, Description, Synopsis, Methods. I guess I should probably also add "Bugs" and "See Also" sections. What kind of things should I be keeping in mind?

I'm also wondering about error handling, specifically how to trap and report errors to the user. I personally find it loathsome to have a script die deep within the bowels of DBI, leaving me with no indication as to what the root cause was. As such, I'm trying to trap third party errors by wrapping code in eval blocks, and handling errors when $@ is populated, re-throwing the exception with some contextual information of my own added to it. I'm not sure, though, how to fully go about this... Should I try to re-trap my own exceptions at successive levels, or would it be better just to call Carp::confess at the lowest level where I expect potential problems? Also, while I realize that for some errors it makes more sense to cite the error from the perspective of the caller, e.g. in the case of invalid subroutine parameters, I'm left with the the quandary of Carp::croak versus Carp::confess. It seems to me that confess is always more useful, with the only drawback being that it dumps a ton of information to the screen. It seems to me that the utility of croak quickly becomes diminished when invocation of the croaking method becomes deeply nested in library routines. What are good criteria for choosing one over the other, or should I always favor the stack trace of confess?

I'm sure there are many other important considerations when publishing a CPAN module, so please feel free to chime in about the things that I may have forgotten.

Comment on A First CPAN Odyssey
Re: A First CPAN Odyssey
by stvn (Monsignor) on Jun 22, 2004 at 17:28 UTC
    Presently, the top level package name is just SQL, so I have sub-packages such as SQL::Object, SQL::Link, SQL::Statement, SQL::Object::ResultSet, SQL::Link::ResultSet, etc.

    This will be a problem as SQL::Statement is already taken, and possibly some of the others too. A quick search on http://search.cpan.org will tell you. A possible solution to this is to put your entire framework under a single top level namespace, this can usually be accomplished with a careful search and replace. Personally I prefer the top-level namespace method, as it tends to make it much easier to use a framework within other projects since I don't need to worry about namespace conflicts.

    What is the best mechanism via which to receive feedback on naming issues?

    The module mailing list (actually there are a two, one for module authors, and the other for announcements), you can find them http://lists.perl.org/. And of course, there is always here too, a node prefixed with "RFC" is all it takes, and I am sure you will get plenty of feedback.

    With regards to testing, I'm fairly confident in my ability to write good tests as far as verifying correctness is concerned, but I also want to adhere to the various conventions for CPAN modules.

    As long as it plays well with Test::Harness, you should be ok. I would assume Test::Unit does that. You can also talk to the people on the perl-qa list, you can find out more about that at http://qa.perl.org/. Personally I have never used Test::Unit, so thats about all I can offer on that subject.

    On a related thread, I'm curious as to how I should go about testing interactions with a database on a user's system.

    Try using mock objects, in particular you can use DBD::Mock.

    With regards to writing documentation, ....

    POD is the standard, your categories sound about right, there is no "must-have" set out there, its really up to you. Although I would recommend the BUGS and SEE ALSO sections, as well as a AUTHOR and COPYRIGHT AND LICENSE. I personally also like to have a CODE COVERAGE section to include my Devel::Cover report in.

    I'm also wondering about error handling, specifically how to trap and report errors to the user

    I prefer exceptions (with plain old die), but TIMTOWTDI. However, in larger frameworks I have written I have sometimes created exception objects with stack tracing abilities and such, but personally I have not decided myself on the best way to do that (Carp, Exception::Class, etc) so I am actually interesting in what the other monks would say about that.

    I'm sure there are many other important considerations when publishing a CPAN module, so please feel free to chime in about the things that I may have forgotten.

    Well, I would actually say that you have thought of/covered a lot of them. But to avoid putting my foot in my mouth, I will let the others comment on that :)

    -stvn
      Thanks, this is all gold.
Re: A First CPAN Odyssey
by jZed (Prior) on Jun 22, 2004 at 19:59 UTC

    SK> a module for abstracting objects and relationships in a SQL database

    JZ> Sorry to be pedantic, but you are asking about namespace so I'll get picky - there is no such thing as a "SQL database" - there are database management systems which use the SQL language to interact with data.

    SK> The first thing I'm wondering about is name spacing issues.

    JZ> You should cc these questions to dbi-dev@perl.org, that is the definitive place for DBI related namespace issues.

    SK> Presently, the top level package name is just SQL, so I have sub-packages such as SQL::Object, SQL::Link, SQL::Statement, SQL::Object::ResultSet, SQL::Link::ResultSet, etc.

    JZ> Several of those are already taken

    SK> I realize that this probably isn't a very good name, but I'm not sure what would be. The SQL name is nice and short, but not really all that descriptive of what the module is doing.

    JZ> As the author/maintainer of several CPAN modules in the SQL namespace (though this of course gives me no more rights to comment than anyone else), I would strongly prefer that the namespace be for modules whose primary purpose is dealing with the SQL language. Modules whose primary purpose is dealing with Databases or with Data or with DBI belong elsewhere.

    SK> What would be a sensible name?

    JZ> To answer that, we need to know what your module does. A decription like you provide "a module for abstracting objects and relationships in a SQL database" is so general that it's hard to comment. I suggest you come up with three different descriptions of your module - a 40 character one for use in the module list, a two sentence one for general descriptions like this posting, and a one page one for a README. Look through other CPAN modules and make sure that each of your descriptions not only describes what your module does, but does that in a way that distinguishes it from other modules. Post them back here, and I'll be glad to comment further. Good luck!

      At present, this is the top level POD I have that describes the module as a whole in an attempt to give a broad view of its purpose and organization. Then there is POD within each class that describes its usage which is not posted here.

      As someone else suggested, it would probably be best for me to simply move the top level name to some new unique name, so as to avoid collision with pre-existing things such as SQL::Statement.

      Oh, and please don't hesitate to be pedantic and highly critical. Submitting my first CPAN module stems not just from the desire to contribute something useful, but also from selfish motivations to learn the process of doing so. You are explicitly authorized to pull no punches. :-)

        You should defintiely read simonm's great node - DBI Wrapper Feature Comparison and figure out how your module relates to the discussion there.

        It seems to me that your module uses SQL to accomplish things but *users* of your module will not be dealing with SQL at all. So therefore you should name the module something more related to what the users will be doing - using objects, persistence, relationships, etc. Take for example your SQL::Column - how is that at all related to SQL? You are providing information about the structure of the table not about the structure of a SQL statement.

        I think your module belongs in the DBIx:: namespace since it is primarily an alternate programming interface to DBI. I'm afraid I don't get any sense from your POD of how your module differs from the dozens of other similar modules (some of which don't do any better of a job at distinguishing themselves in their PODs). Why would someone want to use your module rather than one of the others? I am not at all saying this to discourage you from CPANing your module, rather suggeting you get as good a sense as you can of how a potential user would react to your description of the module - the more you can define what your module does that is unique and helpful and usable, the better the chance that a potential user will look at it. The more you have a defined sense of how your module is unique, the easier it will be to pick a name for it.

Re: A First CPAN Odyssey
by clscott (Friar) on Jun 22, 2004 at 20:16 UTC
    Your suite of modules sounds like a lot like Class::DBI. Maybe you should have a look at it as well as a few other object-relational mappers like Alzabo.
    --
    Clayton

      Obviously this is the kind of thing that would be very useful, and as such I harbored no illusions that I'd be the first to market. I am aware of the existance of the Class::DBI module, and have spent some time looking at it. It is my hope to create a solid competing implementation that more closely approximates my vision of how such a module should work. While doubtless many will disagree that it is "better", I would hope that it will culminate in something to which people give serious consideration as an alternative to Class::DBI. Time will tell whether I will be shown up as a useless crank...

      To be honest, I have always been less than impressed with the current state of RDBMS-OO mapping modules out there. There are some that are nicer than others, but none of them have felt comfortable to me. I welcome another player in the market, maybe skyknight's will be the one that strikes my fancy.

      Of course, many will say that there is no point in re-inventing the wheel, but in the case of RDBMS-OO mappers, I am not sure that the current state of wheels (in any language) are really as best as they can be.

      Here are some interesting links on the subject though:

      -stvn

        Indeed... In fact, the inspiration for the module that I have written comes from a home grown module to which I was exposed at a former job. It had some very good concepts (that I've cherry picked as best I can), but also many serious flaws that were very problematic time and time again. On my own, I have written three different iterations, all inspired by that original module for various projects or consulting jobs. This will be the fourth iteration of my attempt to forge something that lives up to the potential of the original.

        I've really struggled with this module to avoid kludgey compromises, and I hope that that will shine through in the (purported) elegance of the API. Execution efficiency has also been a big concern, as the database at my job which is serving as a pilot project is of the order of tens of millions of records.

        One of the key issues with which I wrestled was how to instantiate a large collection of objects, associated with their children in one-to-many has-a relationships, while neither executing superfluous queries nor pulling in extra information in the join operations. The solution to this was to issue one query for the collection of objects being loaded, and one query for each of the has-a relationships, ordering the object query by its surrogate primary key, and each of the has-a link table queries by the associated foreign primary key. The best image I can conjure for this process is the zipping of an n-threaded zipper where different segments of the various zippers have to be shifted to match up. :-) Presumably other OO-mappers have successfully wrangled with this problem before me, but I was pretty happy with myself for working out all the details. The original module of which I spoke totally punted on the issue, instead assuming that the user would never simultaneously load multiple objects that had one-to-many relationships.

        Oh, and both of your links seem like excellent resources to promote the uniqueness of my own module, something that, as jZed pointed out in another comment, is of utmost importance if I am going to stimulate any interest in my module.

        Thanks again.

Re: A First CPAN Odyssey
by BUU (Prior) on Jun 22, 2004 at 22:33 UTC
    On a related thread, I'm curious as to how I should go about testing interactions with a database on a user's system. It would be very rude for the testing procedures of my module to carelessly stomp on the database residing on someone's machine. What precautions and paradigms should I employ so as both to thoroughly test my module on the user's machine and avoid trashing his environment? I've read horror stories of hapless users having rogue modules smash their data, and I'd rather not be a purveyor of such a module.
    Require DBI::SQLite as a dependency and run your own database, or perhaps DBD::Anydata.
      Ah, that is an excellent suggestion. Thanks for the idea.
      Or use DBD::DBM - it comes with DBI (1.42 and above) so (evenutally) you can count on users already having it.

      Re: Testing

      You should look at how Class::DBI does it's testing using SQLite.
      --
      Clayton
SQLego?
by chanio (Priest) on Jun 23, 2004 at 06:07 UTC
    You should help people deciding for your PM by connecting it in one of the existing branches-namespaces.

    At least, until your differences exceed what is required to share the same branch.

    I believe that when there are so many alternatives, there must be something yet to discover that would finish the searching of alternatives. And so, it would then start improving what exists. Nevertheless, I am impressed to discover the level of any of these modules.

    I would suggest adding dependencies at the starting of every sub, and every module and at the pod (to read before installing).Also, sub input and output ref. types.

    The better documented it is, the less dissapointed should any user get!

    Bone chance!

    .{\('v')/}
    _`(___)' __________________________
Re: A First CPAN Odyssey
by toma (Vicar) on Jun 28, 2004 at 04:44 UTC
    Please make a really good synopsis. The synopsis should be a small, self-contained, working program that shows off the functionality of the module.

    When I evaluate modules, the synopsis is a productivity tool. I cut and paste the synopsis to create a small test program. Modules with a bug in the synopsis, or with a synopsis that doesn't show the strength of the module, have trouble getting my attention.

    I first watched the pod-synopsis cut-and-paste technique used by a tremendously strong perl programmer during a perl programming competition. I have used it ever since.

    Your challenge will be to come up with a synopsis that functions without having a known database to work with. So if you follow BUU's advice and test the install with a small test database, could you use this same database in your synopsis?

    It should work perfectly the first time! - toma

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlmeditation [id://368760]
Approved by castaway
Front-paged by castaway
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others having an uproarious good time at the Monastery: (9)
As of 2014-12-29 03:58 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    Is guessing a good strategy for surviving in the IT business?





    Results (184 votes), past polls