Beefy Boxes and Bandwidth Generously Provided by pair Networks
"be consistent"
 
PerlMonks  

Re: module for cluster analysis

by sgt (Chaplain)
on Dec 25, 2006 at 13:49 UTC ( #591579=note: print w/ replies, xml ) Need Help??


in reply to module for cluster analysis

Besides checking (as other threads have pointed out) CPAN search (http://search.cpan.org), with keywords like 'statistics, deviation etc', another idea springs to mind as you mention that your arrays are huge.

If you build your array piecewise (one element at a time o in chunks) you could calculate at the same time the statistiscal quantities your are interested in. Each time you update the array you calculate the new quantities based on previous ones. For example:

assert( $N > 0, $N_chunk > 0) # etc $av_new = 1/($N+$N_chunk) * ($av_old * $N + $av_chunk * $N_chunk)

A simple "statistical array" class could be set up to package the prevous thing. Adding up two arrays would "add up" statistical properties. Instead of splicing a subarray to a main one, you could also allow treat your global array as a list of references, this way you would not spend too much time doing copies.

By the way higher moment formulas follow simple recurrences, use "google" and wikipedia. hth --stephan


Comment on Re: module for cluster analysis
Download Code

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://591579]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others avoiding work at the Monastery: (10)
As of 2014-11-27 12:28 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    My preferred Perl binaries come from:














    Results (184 votes), past polls