Re: module for cluster analysis

by sgt (Deacon)
on Dec 25, 2006 at 13:49 UTC

in reply to module for cluster analysis

Besides checking (as other threads have pointed out) CPAN search (, with keywords like 'statistics, deviation etc', another idea springs to mind as you mention that your arrays are huge.

If you build your array piecewise (one element at a time o in chunks) you could calculate at the same time the statistiscal quantities your are interested in. Each time you update the array you calculate the new quantities based on previous ones. For example:

assert( $N > 0, $N_chunk > 0) # etc $av_new = 1/($N+$N_chunk) * ($av_old * $N + $av_chunk * $N_chunk)

A simple "statistical array" class could be set up to package the prevous thing. Adding up two arrays would "add up" statistical properties. Instead of splicing a subarray to a main one, you could also allow treat your global array as a list of references, this way you would not spend too much time doing copies.

By the way higher moment formulas follow simple recurrences, use "google" and wikipedia. hth --stephan

