Beefy Boxes and Bandwidth Generously Provided by pair Networks
go ahead... be a heretic

Re: module for cluster analysis

by sgt (Chaplain)
on Dec 25, 2006 at 13:49 UTC ( #591579=note: print w/ replies, xml ) Need Help??

in reply to module for cluster analysis

Besides checking (as other threads have pointed out) CPAN search (, with keywords like 'statistics, deviation etc', another idea springs to mind as you mention that your arrays are huge.

If you build your array piecewise (one element at a time o in chunks) you could calculate at the same time the statistiscal quantities your are interested in. Each time you update the array you calculate the new quantities based on previous ones. For example:

assert( $N > 0, $N_chunk > 0) # etc $av_new = 1/($N+$N_chunk) * ($av_old * $N + $av_chunk * $N_chunk)

A simple "statistical array" class could be set up to package the prevous thing. Adding up two arrays would "add up" statistical properties. Instead of splicing a subarray to a main one, you could also allow treat your global array as a list of references, this way you would not spend too much time doing copies.

By the way higher moment formulas follow simple recurrences, use "google" and wikipedia. hth --stephan

Comment on Re: module for cluster analysis
Download Code

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://591579]
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others taking refuge in the Monastery: (4)
As of 2015-10-07 01:24 GMT
Find Nodes?
    Voting Booth?

    Does Humor Belong in Programming?

    Results (169 votes), past polls