Beefy Boxes and Bandwidth Generously Provided by pair Networks
laziness, impatience, and hubris
 
PerlMonks  

Module Bloat and the Best Solution

by KurtSchwind (Hermit)
on Nov 08, 2007 at 20:23 UTC ( #649798=perlmeditation: print w/ replies, xml ) Need Help??

Forgive my 'newness' but I noticed a tendency to go with a module even if it isn't "core" than it is to go without. I noticed this thread, for example. Both examples are 2 lines. Yet solution that uses a module had (at the time I last checked) more votes than the core solution that didn't use a module at all. Is it really that much easier to read the 2nd solution?

This got me to thinking. So why are we pushing the use of modules so much? And when is it better to go with core instead of downloading a module? When does requiring a module for your solution begin to make sense over a solution that doesn't require a module? Under what criteria should the use of a module be suggested? (emphasis added at update).

In general, I try to stay clear of solutions that require a module that the user isn't likely to have by default, even at the expense of some possibly more elegant solution. What do you all think?

UPDATE:
I think too many people hear are interpreting what I've said as "It's better to not use modules." That isn't what I'm talking about. I'm talking more about "by what criteria do you decide to go searching for a module?" There is a big difference. Let me add a few things as well:
1: In corp America it can be non-trivial to have modules added to servers for production deployment.
2: If there IS a bug in a module, vs a bug in code I've written, it's almost always easier for me to debug my own code. I think that's a general truism. It's always easier to debug one's own code than someone else's.
3: I heart CPAN, so there is no need to sell me on it. I gave up writing my own XML reader 6 years ago. I wouldn't want to go back to doing things like that again.

I guess my brain won't accept a blind statement that everytime I'm coding in perl I should be scanning CPAN to see if something has already been done. There has to be some general guideline to follow to best know when to hit the CPAN archive.

--
I used to drive a Heisenbergmobile, but every time I looked at the speedometer, I got lost.

Comment on Module Bloat and the Best Solution
Re: Module Bloat and the Best Solution
by FunkyMonk (Canon) on Nov 08, 2007 at 20:46 UTC
Re: Module Bloat and the Best Solution
by moritz (Cardinal) on Nov 08, 2007 at 21:19 UTC
    Is it really that much easier to read the 2nd solution?

    It is indeed. You won't notice that if you read just those two lines, but if you try to grasp the meaning of a complex piece of code, every little bit of extra complexity is bad.

Re: Module Bloat and the Best Solution
by dragonchild (Archbishop) on Nov 08, 2007 at 21:27 UTC
    The key is the number of moving parts that you are responsible for. If the moving part is from CPAN, then you just care that it works. More importantly, you're not thinking on the order of the problem, but the solution.

    Another way to think about it is this - Perl itself is a CPAN module. Otherwise, you'd be using C. C doesn't have memory management, but Perl does. C doesn't have scalars, but Perl does. C doesn't have a lot of what Perl does, which is why I use Perl - it makes my life easier. So, that's why I use CPAN modules. As many as I can possibly lay my little grubby hands on. I have my own "core" list that I install whenever I can so that all the tools I want to have are there.


    My criteria for good software:
    1. Does it work?
    2. Can someone else come in, make a change, and be reasonably certain no bugs were introduced?
      Just out of interest, and if it's not breaking any trade secrets, what modules do you have on your "core" list?

      Cheers,

      JohnGG

        These are the modules that I reach for first when trying to solve a problem:
        • Scalar::Util
        • Scalar::MoreUtils
        • List::Util
        • List::MoreUtils (This is just a phenomenal module)
        • File::Slurp
        • DBIx::Class
        • Catalyst and, possibly, CGI::Application
        • DBM::Deep
        • Test::Deep, Test::Warn, Test::Exception, DBD::Mock (numerous others in this category)
        • Set::*
        • Algorithm::*
        • Tree
        • Template Toolkit, Excel::Template, PDF::Template, CAM::PDF, Test::PDF
        • Moose

        And this is just what I can think of in 10 minutes. Some of them are written/maintained by me and most have been contributed to by me or people I know. In addition to this list is a knowledge of how to navigate CPAN and whom to ask when I want to solve something.


        My criteria for good software:
        1. Does it work?
        2. Can someone else come in, make a change, and be reasonably certain no bugs were introduced?
Re: Module Bloat and the Best Solution
by parv (Priest) on Nov 08, 2007 at 21:58 UTC

    In the reference thread, both solutions -- (with & without module use) of finding the unique elements -- are equally fine with me.

    Not using the modules just because they are not installed by defaults, well, <see what others have written so far & will write>.

    If you search around here, this has been discussed many times.

Re: Module Bloat and the Best Solution
by brian_d_foy (Abbot) on Nov 08, 2007 at 23:08 UTC

    Although people will tell you to always use CPAN, don't turn off your thinking cap. As a good programmer, you want to do as little new work as possible and to minimize as much risk as possible. Since CPAN is a horde of people you don't control, you can save yourself a lot of work, but you also take on a lot of risk (which is not an absolute definition).

    Apply those thoughts to using CPAN too. The answers aren't going to be the same for everyone because social constraints and local policy often get in the way. The best thing to do is to have sound technical advice to inform those decisions. As always, whenever someone tells you the One True Answer before they know you're stuation, watch out!

    Here's what I recommend to most people:

    • If you're doing something big and there is a CPAN module that does that and everyone else uses, use it. This is stuff like DBI, DateTime, and so on. In this case, if you don't know what the answer is, just ask. If everyone says the same thing, then use that. If everyone says something different, then you have more work to do.
    • If you're doing something repeatedly in a lot of code, a module is a good way to go. That doesn't mean that you need a CPAN module, but you can always implement your module using a CPAN module. This is especially handy if you decide later to change which CPAN module you want (say, use RoseDB instead of DBIx::Class).
    • Consider the cost of using the CPAN modules. Most are no-brainers: the amount of work to reimplement the good ones is intractable, and the risk of recreating it with many more bugs is very high. In those cases, go with CPAN. In other cases, such as Class::Singleton, the amount of work to reimplement is trivial, and the risk is virtually zero. You don't need to use that module just because it's on CPAN.
    • If you decide to use a CPAN module, what are the chances of something in CPAN breaking your application? Not only do you now depend on that module, but all the modules it depends on. David Cantrell's CPAN Dependencies chart can help you see what you are getting into. This can be a big problem, and I talk about it in my Making My Own CPAN talk. I've worked on plenty of apps that were working just fine until something in the dependecy chain broke (or changed APIs). It's a risk you need to evaluate and mitigate. This doesn't mean that you avoid CPAN, but that you come up with a strategy to freeze module versions, use PAR, or whatever.
    • Some organizations are limited to core modules (for whatever reason). Does requiring an external dependency create more work than it saves? The answer isn't the same for every CPAN module. My Business::ISBNmodule might be worth it, but my Object::Iterate or File::Find::Closures modules aren't.
    • I wouldn't worry so much about authors abandoning modules. The community finds good homes for the useful ones (and those are the only ones you're using, right? :)
    --
    brian d foy <brian@stonehenge.com>
    Subscribe to The Perl Review
      Let me just reiterate the coolness of Cantrell's CPAN Dependencies --this one illustrates the problem especially well. While the percentage isn't entirely fair, there is much truth in it.

      Part of what makes Perl cool and successful is its portability, which it achieves by requiring little of the host system. Another part is its ability to work with whatever external packages are available. Thanks to CPAN it's not quite the same situation for Perl modules, but I think these criteria still apply.

      I like all your points but-

      Since CPAN is a horde of people you don't control, you can save yourself a lot of work, but you also take on a lot of risk (which is not an absolute definition).

      I say the risk is approximately zero unless your general planning/design skills are totally haphazard.

      You downloaded the code, it's done. The only "risk" is that the author will take it an unwanted direction or something. You don't have to upgrade and you can fork the module or abandon it for your own new implementation with the same API so your other code won't have to change. Either way, you've lost nothing because you would have written/owned it otherwise to start with.

Re: Module Bloat and the Best Solution
by grinder (Bishop) on Nov 08, 2007 at 23:38 UTC
    Both examples are 2 lines. Yet solution that uses a module had (at the time I last checked) more votes than the core solution that didn't use a module at all. Is it really that much easier to read the 2nd solution?

    Read some of educated foo's crusades against one-liner CPAN modules here and here for a counter-argument to using small CPAN modules. Just for the record, I don't agree with the premise, but I respect the argument.

    Some of these small modules have been zenned to a functional minimum and yet may deal with subtle edge cases invisibly, or in some way Do The Right Thing.

    And once the code has been hidden behind the interface of a module, it doesn't really matter what it looks like. If there's a bug, it can be corrected and you don't have to do anything. You can't do that when the snippet is scattered inline repeatedly across a large swath of code. If it's too slow, it can be XSified, and still you don't have to change anything on your side.

    I believe that the more you use modules, the more you can chunk things and operate at a higher level. I've used LWP::UserAgent and HTTP::Request for years, and have looked at the code for probably all of two minutes. And of the two minutes I spent, the main thing I took away was "Gee, I'm glad I don't have to worry about that."

    • another intruder with the mooring in the heart of the Perl

      If there's a bug, it can be corrected and you don't have to do anything.

      Bugs won't fix themselves. If you find a bug in, say, XML::Twig, you're extremely lucky, because mirod really cares about his modules. How many CPAN authors are as responsive as he is?

      If you you find a bug in a module whose author either doesn't care or has no time, then the bug is yours.

      You can't do that when the snippet is scattered inline repeatedly across a large swath of code.

      You wouldn't do such scattering anyways, would you? You'd factor these out into a subroutine, and fix any bugs there.

      --shmem

      _($_=" "x(1<<5)."?\n".q·/)Oo.  G°\        /
                                    /\_¯/(q    /
      ----------------------------  \__(m.====·.(_("always off the crowd"))."·
      ");sub _{s./.($e="'Itrs `mnsgdq Gdbj O`qkdq")=~y/"-y/#-z/;$e.e && print}
        Couldn't agree more with your point about bugs in CPAN modules becoming your problem. Excepting core modules, this happens a lot in my experience. I now have a growing personal collection of hacked and patched modules with fixes to significant bugs which have been reported but not yet fixed on CPAN, and that includes such well-respected examples as CGI::Session. CPAN's a great resource, you have to factor in the extra time it takes to evaluate, test, and possibly fix what you find there.
Re: Module Bloat and the Best Solution
by zby (Vicar) on Nov 09, 2007 at 11:54 UTC
    For me the solution with module is more readible. It is more in line with our thinking - it says:
    • take the 'uniq' operator from the library
    • apply it to the list
    The other solution says:
    • create a hash by assigning 1 to every element of the list
    • list the keys of the hash
    It requires much greater thinking loop to understand that this is indeed the solution to the problem at hand. You need to think about how hashes are created, and what is listing their keys - with the library solution you only need to think if the 'uniq' sub really does filter unique values out of the list, but the name of the function is mnemonic enough that it's kind of automatic.
      To further emphasize this point, there are a number of edge-cases that most programmers will never encounter, even in such a "simple" problem as unique-ing a list. For example, undef and "" will both be treated the same way, which may not be appropriate. Two-faced scalars may not be handled correctly. Objects and references will certainly not be handled correctly. A library, on the other hand, can solve this problem without you even needing to know that the edge-cases existed. That is the big win.

      (Note: This isn't to say that the library always does it right. The version of uniq() in List::MoreUtils doesn't handle two-faced scalars correctly, but it does handle objects and references correctly, as expected.)


      My criteria for good software:
      1. Does it work?
      2. Can someone else come in, make a change, and be reasonably certain no bugs were introduced?
        >>A library, on the other hand, can solve this problem without you even needing to know that the edge-cases existed<<

        Unfortunately you can never take it on trust that any given library *will*, so it's still really down to you to try and think of all the edge cases, test them, and then possibly fix them. Sounds like your own example bears this out.

Re: Module Bloat and the Best Solution
by Porculus (Hermit) on Nov 10, 2007 at 11:04 UTC

    Something nobody else seems to have picked up on yet is your assertion that both examples are 2 lines. It's true, but only until you want to uniquify more than one list!

    Compare

    my %tmp = map { $_ => 1} @foo; my @uniq_foo = sort keys %tmp; %tmp = map { $_ => 1} @bar; my @uniq_bar = sort keys %tmp; %tmp = map { $_ => 1} @baz; my @uniq_baz = sort keys %tmp;

    with

    use List::MoreUtils; my @uniq_foo = uniq @foo; my @uniq_bar = uniq @bar; my @uniq_baz = uniq @baz;

    Four lines instead of six, and it's getting a lot harder to argue that the two approaches are equally readable.

    Yes, you could also define your own uniq function, but why bother when it's been done for you already? If you're worried about people who don't have List::MoreUtils, then just copy the existing uniq function out of that into your own code (license permitting). You'll still be reusing ready-made, ready-tested CPAN code, and that's still usually better than reinventing the wheel.

      my %tmp = map { $_ => 1} @foo; my @uniq_foo = sort keys %tmp;

      I personally believe that you can avoid the intermediate step:

      my @uniq_foo = sort keys %{ {map { $_ => 1} @foo} };

      Yes, it's a dirty trick! :)

      And as far as your example with three arrays and the respective uniq'd incarnations is concerned, whichever way you choose, I would go with a HoA instead:

      my %uniq = map { my %tmp = map { $_ => 1} @$_; $_ => [sort keys %tmp]; } keys %arrays;

      or

      my %uniq = map { $_ => [sort keys %{ {map { $_ => 1} @$_} }]; } keys %arrays;

      or

      my %uniq = map { my %saw; $_ => [map !$saw{$_}++, @$_]; } keys %arrays;

      or

      my %uniq = map { $_ => [uniq @$_] } keys %arrays;

      Not to pick-nits, but if you really want to do a line count compare shouldn't you include the lines in the module? After all, you are including them.

      Also, you can just write your own sub and then every subsequent need would only be 1 line as well.

      Why write your own? Because it's a trivial solution and you don't have to download yet another module for it. Then again, that's why I started this thread. To discuss when it makes sense to download another module.

      --
      I used to drive a Heisenbergmobile, but every time I looked at the speedometer, I got lost.
Re: Module Bloat and the Best Solution
by tuxz0r (Pilgrim) on Nov 12, 2007 at 14:35 UTC
    I feel compelled to respond, considering Kurt's using one of my code snippets as an example.

    I definitely agree that using modules for things like XML parsing, Database Access (DBI) and other commonly use, non-trivial functionality makes sense. These are usually complex tasks that would take some time to "reinvent" and the modules are well tested and maintained (usually). Yes, don't reinvent the wheel in those cases.

    However, the example Kurt gives is that of finding the unique values in an array. This seems like a coding task that can be handled by the fundamentals of the language; and that is attested to if you look at the 2 lines in the 'uniq' subroutine in the actual List::MoreUtils module. Finding the unique elements in an array, from my experience with Perl, would be something I would never have thought about using a module to handle. Part of why I use Perl is that there are simple, easy ways to do things like finding the unique elements in an array.

    And, no matter which way I code it, I can make it handle as many edge cases or local constraints that my program may have. I'm sure there are at least 1000 ways to skin a cat, and so there are ways to find unique elements in an array with Perl. If I need to do it more than once in the program I can just as easily put it into a subroutine of my own, with out the need to download List::MoreUtils at all.

    But, for a sufficiently complex task, and one that is common to the particular domain I'm writing for (database access, xml parsing, web sites, etc.) I will almost definitely look to see if there is a module that exists that will make my life easier, streamline my code, shorten my development time, etc. But from my standpoint, finding the unique elements in an array is not one of those.

    ---
    echo S 1 [ Y V U | perl -ane 'print reverse map { $_ = chr(ord($_)-1) } @F;'
    Warning: Any code posted by tuxz0r is untested, unless otherwise stated, and is used at your own risk.

      Why are you increasing your maintenance burden? Why are you choosing to disregard battle-tested code? The uniq() in List::MoreUtils was worked over by many people over several years and was written in a way so as to both do the right thing and do it quickly.

      Do you know why it was written the way it was instead of the naive sub uniq { my %x;@x{$_} = undef for @_; keys %x }? There are at least two major problems with that code and possibly as many as four or more. And, if you don't know why, you have no business writing your own version cause you're going to screw it up.

      Even though I know why it was written the way it was, I still use it because when another problem is found, I get the bugfix for free! I know how to write a hashtable, but I don't choose to because it's boring (to me) and I'll screw it up. Same thing with uniq() or any of the other 2 dozen functions that module provides.


      My criteria for good software:
      1. Does it work?
      2. Can someone else come in, make a change, and be reasonably certain no bugs were introduced?

        Are you saying ALL CPAN modules are battle-tested?

        Also, I'd say that getting the

        "bugfix for free"
        is a dangerous conceit. I know that when my users find a bug in production code, I can't wait for J. Random Module writer to provide a fix. I need to fix it myself. I have to own the bug immediately.

        The danger of statements like

        "And, if you don't know why, you have no business writing your own version cause you're going to screw it up."
        is that you are implying that if you can't write flawless code, you might as well not start to learn. I definitely can't back you on that sentiment. Perl is a great language for experimenting and learning on. I wouldn't want to be as discouraging as you are towards users.

        --
        I used to drive a Heisenbergmobile, but every time I looked at the speedometer, I got lost.

        Do you know why it was written the way it was

        I don't. What I don't understand is why it uses numerical comparison and map instead of the plain and simple

        sub unique { my %h; grep !$h{$_}++, @_; }
        which, at least on Perl 5.8.8, is faster. (Notably faster if there are many identical elements.)

        Any enlightenment would be appreciated.

        lodin

Re: Module Bloat and the Best Solution
by bart (Canon) on Nov 12, 2007 at 15:18 UTC
    I'm not even going to look at your example. But I'll give you my general rule of thumb instead:
    Use a module if it's very likely that the code you'd write instead of using the module, will contain a bug.
    I care about writeability, not readability of the code. After all, we're above copy/paste programming, aren't we? Aren't we?

    That's what I thought.

        Copy/paste programming, as I understand it, is looking up code where you did something similar, copy and paste it into your script, and then edit what needs to be different.

        If you need to do this, this shows to me that there's something wrong with the language you're using, or at least, with the way you're using it. A sign of weakness — especially if that's the best way to do it.

Re: Module Bloat and the Best Solution
by sundialsvc4 (Monsignor) on Nov 20, 2007 at 18:17 UTC

    There's a maxim ... and it's a good, true maxim ...

    Code is much harder to read than it is to write.

    So you're entirely correct when you say, “it's always easier for me to read my code ...” but what would everybody-else say? Of course. So, the true bottom line needs to be ... if you're doing something that's already been done, you're truly wasting your own time and for no defensible reason.

    CPAN has a certain amount of “broken, smelly crap” of course, but quite often that's just your first impression, and that impression usually fades quickly. The truth is, most of the stuff that you set out to write, as though you were the first human on the planet to have done so, isn't “original” at all. There is no shame in changing your approach to a project into:   “my task is to discover and then select an appropriate solution for this problem.”

    What is particularly interesting and useful, if you will allow yourself to be receptive to it, is the very different and sometimes-unique points of view that a particular CPAN-author will bring to what he or she has contributed. There's sometimes a very-surprising depth of experience just sitting right there, waiting for you only to pick it up and take advantage of it. The more you do that, the more you prefer doing that.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlmeditation [id://649798]
Approved by marto
Front-paged by brian_d_foy
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others chilling in the Monastery: (11)
As of 2014-08-21 21:57 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    The best computer themed movie is:











    Results (144 votes), past polls