http://www.perlmonks.org?node_id=526737

srdst13 has asked for the wisdom of the Perl Monks concerning the following question:

Let me start with CPAN--I use it daily (sometimes hourly) to find what I need and what I have. That is, I rely on it for perldocs, even for modules that I know I have installed because it is so easy to use. In my own coding, I have begun to think about code reuse and even documentation (a first for me) in my personal development process. I work alone on many small projects rather than a few larger ones. I now know enough to be productive when producing a useful set of modules for a particular project. I'm happy with all of this....

Recently, though, I found myself rewriting a module that I know that I wrote earlier, but couldn't easily locate--very frustrating. My question is simple--what do the monks do to keep organized in the face of many small projects with the lifespan of a typical project is on the order of hours to days? Is it possible to set up a simple "database" of personal modules so that one could search for them easily?

Thanks,
Sean
  • Comment on Organizing personal perl library (AKA, personal CPAN)

Replies are listed 'Best First'.
Re: Organizing personal perl library (AKA, personal CPAN)
by xdg (Monsignor) on Jan 31, 2006 at 14:08 UTC

    It sounds like you may have useful modules buried amidst your various projects. While a source code repository is a good thing (for many reasons beyond organization!), you may just need to be a little more structured in how you break out the reusable bits.

    My suggestion is to package reusable bits as modules as if you were going to release them to CPAN, even if you don't release them to CPAN. Put them under a "Local::" or "Username::" namespace and install them locally. Keep a directory of all those reusable bits separate from your projects that use them. E.g.

    /home/yourname /modules /Local-Module-a /Local-Module-b ... /projects /project-1 /project-2 ...

    (That works well in a repository like subversion too -- within each module and project you can have subdirectories for branches, tags, etc.)

    The trick is to do it as you write them. Rather than hack up something for your current project, when you recognize that you're writing something more general, stop and pull it out into a separate module.

    Creating a basic installable module distribution is pretty easy. See How to make a CPAN Module Distribution for examples. Tools like ExtUtils::ModuleMaker and Module::Starter make it really easy to create a new install-ready distribution.

    ExtUtils::ModuleMaker has a nice menu system that pretty much walks you through all the steps. (Note: If you use it, go into "Directives" and set the "Compact" option to 1 -- you'll thank yourself later.) Then just fill in your code to the boilerplate .pm file in the directory it creates, run perl Makefile.PL and make, make test and make install.

    Once you've done that once or twice, that will cost you about 2 minutes or less once you've identified that you've got a reusable bit of code. (More, of course, if you document, write proper tests and so on.)

    Of course, you'll still need to figure out how to package them with any projects that you are handing off. There's been some discussion of that here, but many projects seem to either put module dependencies on CPAN, or package them with an application under a lib directory with the rest of the project. (If you want to use CPAN, but the modules aren't really useful for others, release them under a namespace with your username as the first part to signal that they aren't really for public consumption.)

    Even if you don't want to go to the trouble of writing them up as modules, keeping reusable .pm files in nicely organized directories and adding them to your PERL5LIB might be an easy way to help you find stuff later.

    -xdg

    Code written by xdg and posted on PerlMonks is public domain. It is provided as is with no warranties, express or implied, of any kind. Posted code may not have been tested. Use of posted code is at your own risk.

Re: Organizing personal perl library (AKA, personal CPAN)
by tirwhan (Abbot) on Jan 31, 2006 at 13:08 UTC

    I'd suggest keeping all your code in an SCM repository, see this list for a selection of apps you can use. Then you can just do a simple grep through your workspace to find code that you've written before. And if it's a web interface you're after, you can use something like this (this is a web interface just for the subversion SCM system, but similar things exist for many other SCMs as well).

    Of course, if you want your modules to be organised similarly to CPAN, you could always submit them to CPAN :-).


    There are ten types of people: those that understand binary and those that don't.
Re: Organizing personal perl library (AKA, personal CPAN)
by zentara (Archbishop) on Jan 31, 2006 at 14:23 UTC
    Over the years, I have accumulated quite a large selection of snippets,(nearly 700 megs) which I catagorize according to these directories.

    Then in each of those directories, I have subdirs called 1DOCS( so it's at the top), and a subdir for each project in that catagory. I also place the many varied snippets that releate to that catagory. I try to name the snippets to be descriptive of their content, like "dir-list-recursive" or "threaded-shared-hash", etc,

    Now I can go to the directory catagory and scan the names for the snippet I'm looking for, OR when I'm just looking for usage of a function, I scan each file for the word with Gtk2 Visual Grep or in the past... ztksearch. Those apps let me recursively scan the directories for a name fragment, or a word fragment in the files.

    Using this system, I can find whatever I need within a minute.


    I'm not really a human, but I play one on earth. flash japh
Re: Organizing personal perl library (AKA, personal CPAN)
by pileofrogs (Priest) on Jan 31, 2006 at 19:23 UTC

    I had a very similar epiphany not long ago. I'm still working towards a satisfactory system, so what I have to say may not be the best advice. However, since I'm still in the trenches on this one, a few of the lessons learned are very clear to me. Here's my so-called wisdom.

    • Set up a revision control system (I use CVS) that you can access remotely.
    • NEVER do any coding that doesn't go into your revision control system
    • Keep all those weird modules you write in the same top level name space. For example, I might have Pileofrogs::Util, Pileofrogs::Frob, Pileofrogs::Foo. Then you never have trouble finding and browsing your work. If you create something worthy of CPAN, break it out into a more appropriate namespace.
    • If you find yourself writing something twice, put it in one of those modules.
    • You don't have to do all the bells and whistles of a CPAN module, but you should at the very least script any installers. Even if you only have one machine, write installation scripts. They not only make installation easy, they also serve as very precise documentation of how you installed your stuff.
    • Make all your machines automatically get and install the latest version of your work, This keeps all your machine in sync.
    • Try not to write two versions of a script to handle slightly different situations (like two different OSs) write one script that can handle both.
    • Write scripts that help you keep your scripts in compliance with whatever system you devise. E.G. a script that can write most of your installer for you.

    If anyone thinks what I'm doing is dumb, please cry out and save me from myself and the OP from my advice.

    --Pileofrogs

      I suggest staying away from the "Username::*" naming convention. If you decide later you like it well enough to publish it on CPAN then you'll have to rename it.

      Version control systems are useful, and definitely a good idea even on solo projects (I'm a fan of RCS these days, largely because I live inside of emacs... checking something into RCS is just 3 keystrokes: "C-x v v"), but I'm at a loss as to how version control would make it easier to find a module.

      I think the original poster really needs better ways of searching his disk... I keep meaning to play with swish-e, myself, but in the meantime I fall back on find/greps of various sorts.

      Of course, a better organizational strategy wouldn't hurt, but there really isn't any one right way of doing these things...
Re: Organizing personal perl library (AKA, personal CPAN)
by rational_icthus (Sexton) on Jan 31, 2006 at 20:40 UTC
    Here's what I do:

    I make sure to name all of my subroutines in each file sequentially like "sub s1", "sub s2", "sub s3"... This helps minimize the amount of space the file takes up on disk. Then, so I never lose track in a file, I split all subs into separate files (deleting all comments to save space) and name them sequentially as well, "f1s1.pm", "f1s2.pm", ..., "f1238s13.pm". I store all files right on C: (fewer directories means fewer bytes taken up on my machine) and then I use Google Desktop to search for files when I need them. With 12,462 files, Google Desktop still manages to find what I need in no time at all. I just sift through the ten or twenty results it kicks back until I find what I need and I'm set to go. Most of my files look like this:
    use f1.s1; use f1.s2; use f1.s3; ... # about twenty or thirty lines omitted ... use f132.s12; if(f1::s3($a,$b)){f12:s3(f3:s2($b)+$a)} ...
    What'cha think?

      My guess is a lot of people will not like it, because the names are not descriptive at all. your idea, however, points to a long standing problem caused by the 'files and folders' paradigm of information management:

      • some people use filenames and directory names as a "UniqueID" (eg 20040123_001.pl)
      • some people use file and dir names as a "Description" (eg my_convert_to_html.pl)
      • some people use file and dir names as a "combination field" (eg conv_html_20040123_ver1_utf8.pl)
      • some people use file and dir names with special "symbols" to denote extra meaning, or influence the sort order when listing contents (eg 1docs, _about, !old)

      The problem with *all* of these approaches is that they are of diminished used unless you use them 100% consistently, and things can tend to 'break' when you find yourself needing to change a name for whatever reason.

      One approach is to use something like folksonomy tagging on your own local files, or use any of the various desktop search solutions. You can also be dilligent about adding metadata to your files (NOTE: some prefer a more neutral syntax, since it can be a lot less annoying if you can use the exact same documentation style in all of your code, regardless of programming language).

      All of the approaches mentioned here and elsewhere in this thread, however, are pale vestiges of what could be possible if filesystems were more 'tag' oriented or 'object' oriented than the old-but-popular 'folders and files' paradigm.

      =oQDlNWYsBHI5JXZ2VGIulGIlJXYgQkUPxEIlhGdgY2bgMXZ5VGIlhGV
      I don't think that's a very good idea.
      • First, "use f1.s1" is invalid syntax, proving that you are at least distantly descended from a breed of bears that was unable to hibernate properly.
      • Second, "f12:s3" isn't valid syntax either, raising serious doubts as to the identity of your father, and suggesting that you probably shouldn't be too confident of your mother either.
      • Third, you are going to a lot of work to save space, then completely covering up the fact that you have to put "1;" at the end of every file, frivolously throwing thousands of innocent bytes into the trash. If you had any brains at all left over from your unfortunate heritage, you would load them all from a single file.
      • The previous item is completely swamped by another source of waste evidently deriving from your tendency to lick the wrong side of stamps: the very first sentence of your post points out that you're repeating "sub " over and over again. Together with having all of these subroutines in separate files, that means that you will be placing the same uselessly repeated four bytes at the start 12,462 disk sectors. After lining up that many ones and zeroes, the magnetic field on your hard drive would set up excessively strong polarization in those areas, and would corrupt all surrounding data and probably destabilize the platters' rotation, causing them to hurl themselves at your neck. Although I doubt your exact species, I still don't think you could survive such a blow, and hence I highly suspect your assertion that you are actually using the techniques you describe (see the first two items in this list for corroboration.) Do you honestly expect us to believe that you wouldn't just install a subroutine into @INC that split apart the subroutines bodies appended immediately after each other and put them in a dummy filehandle wrapped with the appropriate subroutine declaration? I mean, you pretty much gave it away by naming your subroutines "s1", "s2", etc -- obviously, your @INC routine generates the subroutine names by constantly increasing a unique id.
      • Finally, you are revealed as the fraudulent peacock massage therapist we all know you truly are by your ridiculous filenames. If you had really gone to all that work to save space, obviously you would have put the subroutine definitions directly into the filenames themselves -- after all, we know you have an @INC routine to really load the subroutines, so why not use it for converting the restricted character set allowable in filenames into the full Perl alphabet? As a side benefit, searching is far easier (the original point of this thread), because the character set through which you search is much condensed. Ever try to Google for '/(?>.*?)/' ?
      In short, I feel your scheme is quite clever, but I am uncertain as to why you are concealing it from us behind such flimsy pretenses.
      this is a joke, right?