Beefy Boxes and Bandwidth Generously Provided by pair Networks
We don't bite newbies here... much

Re: module for cluster analysis

by sgt (Deacon)
on Dec 25, 2006 at 13:49 UTC ( #591579=note: print w/replies, xml ) Need Help??

in reply to module for cluster analysis

Besides checking (as other threads have pointed out) CPAN search (, with keywords like 'statistics, deviation etc', another idea springs to mind as you mention that your arrays are huge.

If you build your array piecewise (one element at a time o in chunks) you could calculate at the same time the statistiscal quantities your are interested in. Each time you update the array you calculate the new quantities based on previous ones. For example:

assert( $N > 0, $N_chunk > 0) # etc $av_new = 1/($N+$N_chunk) * ($av_old * $N + $av_chunk * $N_chunk)

A simple "statistical array" class could be set up to package the prevous thing. Adding up two arrays would "add up" statistical properties. Instead of splicing a subarray to a main one, you could also allow treat your global array as a list of references, this way you would not spend too much time doing copies.

By the way higher moment formulas follow simple recurrences, use "google" and wikipedia. hth --stephan

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://591579]
and all is quiet...

How do I use this? | Other CB clients
Other Users?
Others surveying the Monastery: (7)
As of 2018-05-21 21:18 GMT
Find Nodes?
    Voting Booth?