Beefy Boxes and Bandwidth Generously Provided by pair Networks
P is for Practical
 
PerlMonks  

On the rejected additions to List::Util

by metaperl (Curate)
on Apr 28, 2009 at 18:54 UTC ( [id://760703]=perlmeditation: print w/replies, xml ) Need Help??

In the suggested additions section of List::Util Graham says:
The following are additions that have been requested, but I have been reluctant to add due to them being very simple to implement in perl
In my opinion it does not matter if something is easy. The purpose of a module is to support the DRY principle. If it seems likely that many people will need something and hence lead to many people writing the same thing, then it needs to be widely available in a high availability module.

opinions?

why am I bringing this up

  1. because I want to discuss the idea of what belongs in an API
  2. because I need to average of a list of numbers and it is not in this API or List::MoreUtils and PDL is a bit too heavy weight for me (not to mention it didnt build when I last tried).
  • Comment on On the rejected additions to List::Util

Replies are listed 'Best First'.
Re: On the rejected additions to List::Util
by ikegami (Patriarch) on Apr 28, 2009 at 18:58 UTC

    because I need to average of a list of numbers and it is not in this API

    It kinda is. sum(@list)/@list.

      That is not very DWIM. I take engineering software seriously. And aim for readability and conciseness. DWIM supports that greatly.
        How do you average numbers when you don't have a computer? I can't think of a more readable and DWIM-my way of writing something than by expressing the very definition of how it'd be done by hand.

        Kids learn the average is the sum divided by the count at a young age, so it's definitely readable.

        As for conciseness, sum(@list)/@list is really not that far off from avg(@list)

        Nothing is stopping you from putting that expression in a sub if you think it's not readable or concise enough.

Re: On the rejected additions to List::Util
by mr_mischief (Monsignor) on Apr 28, 2009 at 20:47 UTC
    As ikegami points out, the arithmetic mean isn't that difficult with the tools provided by List::Util. Even the geometric mean isn't that much code once the multiplication is reduced.

    sub geo_mean { exp( (log (reduce { $a * $b } @_)) / @_); }

    I think Statistics::Basic as suggested by Roy Johnson is a good start for someone looking for statistical methods. Statistics::Descriptive may be even better since it includes things like percentiles, the geometric mean, trimmed means, and frequency distributions. Then there are Math::VecStat and a few others.

    As the question was asked relative to API design, let me point out Acme::Tools as a counterexample. This module offers both the arithmetic mean and the geometric mean. Along with those are some date functions, table formatting code, credit card processing subs, URL handling, and compression/decompression along with yet more stuff. This is the sort of thing module authors usually try to avoid, as you don't always need to compress your data with bzip2 when you're trying to find out the date for Easter.

    You see, the list-specific tools of List::Util are limited by what the author felt were very handy within the domain of lists which didn't tread into other recognized domains. Since statistics is another recognized domain (there are university classes named "Statistics", even), modules that handle other statistical methods are a better place for things like averages.

    Having lots of methods available to you on CPAN is wonderful. In some cases it might make sense to add a method here or there to an API for a module. However, one must draw the line somewhere. You really don't want CPAN to be one giant super module that exports everyone's pet method into your name space. I think the current lines between the List modules and the Statistics modules makes plenty of sense. There are times you're working with lists and don't need statistics. There are other times you need statistics but not reduction, set logic, and zipping.

    In this particular case, searching for "average" on http://search.cpan.org admittedly isn't very helpful. Searching for "mean" is slightly more helpful. Searching for something with fewer unrelated meanings like "geometric mean", "statistics", "monte carlo", or "standard deviation" is much more helpful. Just because you can't find the method you want where you look first doesn't mean nothing on CPAN has it, though.

    Update: I replaced 'classes named "Statistics",' with 'university classes named "Statistics",' to make clearer what definition of "classes" I was using.

      you don't always need to compress your data with bzip2 when you're trying to find out the date for Easter

      Such can come in handy when playing Fizzbin, however. ;-)

      HTH,

      planetscape
Re: On the rejected additions to List::Util
by moritz (Cardinal) on Apr 29, 2009 at 07:36 UTC
    Just let me say that the any() implementation in List::MoreUtils is more general than the one described in the list of rejected functions.

    Specifically in List::MoreUtils it takes a block, and evaluates that for every list item, and returns true if the block returned true once. The one described in List::Util just gives true when any of the list items is true, so

    List::MoreUtils::any is roughly like List::Util::proposed_any map BLOCK LIST

    except that it's optimized.

    That's not as trivial (ok, it's only slightly more complicated) as Graham puts it, which is why I never understood his point.

      List::MoreUtils::any() also stops looking as soon as one of the blocks returned true. Check :

      perl -MList::MoreUtils=any -E 'any { say $_; $_ < 5 ? 0 : 1 } 1 .. 10'
Re: On the rejected additions to List::Util
by Roy Johnson (Monsignor) on Apr 28, 2009 at 19:38 UTC
    Statistics::Basic might be what you're looking for.

    But I agree with your sentiments that it's better to have an abstraction — even a simple one — written once than re-invented.


    Caution: Contents may have been coded under pressure.
      Just that some people want an abstraction isn't a reason to put in List::Util. Feel free to create a module with the abstraction you want and distribute it.
        What are legitimate reasons to put it in List::Util?

        Having List::Utils, List::MoreUtils, List::EvenMoreUtils, List::SomeMoreUtilsByMe, List::AnotherCoupleUtilsTheyForgot, etc. seems like it might be a Bad Thing. Better to have basic list utilities in one List::Utils module, I think. And the reason they're not included? They're "too basic".

        It's a perfectly good reason to include them in List::Util. It's not an absolute mandate that they be included, but it is a reason.


        Caution: Contents may have been coded under pressure.
Re: On the rejected additions to List::Util
by borisz (Canon) on Apr 28, 2009 at 19:54 UTC
    - The module author determinate what belongs to the API.
    - As you pointed out all rejected functions are already in List::MoreUtils. I'm strictly against duplicating every function in all modules.
    - I like clean simple API's much more than a collection of oneliners.
    Boris
      The module author determinate what belongs to the API.

      Normally, yes. Does that still apply to core modules like List::Util? I'd always assumed that authors gave up a certain measure of control in exchange for having their module distributed as a fundamental part of the Perl standard library.

      As you pointed out all rejected functions are already in List::MoreUtils. I'm strictly against duplicating every function in all modules.

      The question is more why List::MoreUtils has to exist at all. Why not put the functions in List::Util and do away with List::MoreUtils completely? Isn't it simpler and cleaner to have one utility module instead of several?

      The other issue is that List::MoreUtils is not a core module. That means there is no guarantee it will always be available. This is a big problem for many people. We don't all have root on the machines our code runs on. We might not even have access to those machines, in which case we are unlikely to have the clout to insist that third-party modules are installed on them. If functionality is not in the core libraries, we might be forced to reinvent these wheels again and again and again. How is that good?

      I like clean simple API's much more than a collection of oneliners.

      Do you think it's cleaner to have all the oneliners copied and pasted into the start of every script that uses them? Do you think it's simpler to have to remember which of several list utility libraries each basic function is in?

        The arithmetic mean, geometric mean, mode, median, etc of a group of numbers are not properties of a generic list of scalars. There is no meaningful arithmetic mean of qw( Porculus mr_mischief metaperl ikegami ) for example. Those are statistical properties, and I believe rightly belong in the Statistics name space rather than the List name space.
        You do not need root access to use modules or install them. It is just a little more painful to install.
        The question is more why List::MoreUtils has to exist at all. Why not put the functions in List::Util and do away with List::MoreUtils completely? Isn't it simpler and cleaner to have one utility module instead of several?
        One reason is I use List::MoreUtils for years and do not plan to change all scripts. But a bit overdone, you ask to fold CPAN into the core? A lot of my scripts use XML::LibXML and DBI and HTML::Template. I like to get them into the core too.
        Do you think it's cleaner to have all the oneliners copied and pasted into the start of every script that uses them? Do you think it's simpler to have to remember which of several list utility libraries each basic function is in?
        Often yes, since I have to read learn every silly simple function. Sure it is not a problem, if I add some simple handy functions to my script as well. I even like the split between the core functions and the handy glue. Or you could use List::AllUtils, but thats again not in the core.
        Boris
Re: On the rejected additions to List::Util
by chromatic (Archbishop) on Apr 29, 2009 at 17:39 UTC
    The purpose of a module is to support the DRY principle.

    Perhaps that's the purpose of modules you write, but are you comfortable expressing that as a universal imperative?

      s/The/One/

      /J

Re: On the rejected additions to List::Util
by otto (Beadle) on May 01, 2009 at 20:28 UTC

    All of the discussions above have their merits. However you can argue this til hell freezes over (that is if you believe in hell).

    At issue is language (think API) design.

    Some Asian languages have a different "character" to represent things - hence LOTS of characters.

    Other languages use a small set of characters and build up words to represent things.

    And some languages take words and slam them together to get new words to mean other things.

    I like to think of things in terms of "common idioms" - not something I'm going to write up a whole function to do.

    Generic programming is the further abstraction of things - pushing functionality and the data to be worked upon - out into parameter-land.

    Where does one stop. It is up to the author and environmental drivers. There is no one-best answer - is that not the perl way? :)

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlmeditation [id://760703]
Approved by ikegami
Front-paged by ikegami
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others sharing their wisdom with the Monastery: (2)
As of 2024-04-26 03:37 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found