belg4mit has asked for the wisdom of the Perl Monks concerning the following question:
For the math monks, a formula and description of the algorithm are available over there. Alas, I'm not able to fully grok the description and have been unable to tackle implementing it in perl *. I've come across a couple Fortran and VB implementations; although neither language is very perl-like, and thusly would not be well suited for translation. Would anyone be interested in taking up the challenge of writing an N-D or 1-D implementation in perl with a simple interface in perl? i.e; accept a reference to/list of the values to classify and the number of desired classes** and spit back the classified values or class-divisions.
happy hacking!
P.S. For an implementation reference see Milligan's. I cannot attest to the quality of the Fortran but the README can provide some interesting insights as well.
P.P.S. I inquired about this in the cb and discussed it with theorbtwo and atcroft, mentioning it in passing today Limbic~Region urged me to post it as a potentially interesting diversion for some.
* There is in fact a wrapper for a C implementation however it lacks documentation, seems to require lots of unusual extras and is oriented towards clustering 2-D data.** The number of classes can influence the interpretations of the resulting analysis however, at least in 1-D, there are relatively few meaningful values and so it is easy enough to test them by hand for bias. Typical values are 3-6, with many implementations defaulting to 5. There are many reasons for this:
- for 2 classes it'd be easier to use the mean
- larger numbers of classes are difficult to handle visually. If you insist on 8+ classes you are probably better off with an even gradient of divisions.
--
In Bob We Trust, All Others Bring Data.