Beefy Boxes and Bandwidth Generously Provided by pair Networks RobOMonk
We don't bite newbies here... much
 
PerlMonks  

Memory efficient statistical distribution class

by Dallaylaen (Scribe)
on Jun 07, 2013 at 11:47 UTC ( #1037660=perlquestion: print w/ replies, xml ) Need Help??
Dallaylaen has asked for the wisdom of the Perl Monks concerning the following question:

I'd like to analyse some data (say, web-service response times) and get various statistical info, mainly percentiles/quantiles and presence of outstanding values.

I know about Statistics::Descriptive, however, I don't want to store all the data in memory. On the other hand, having my results off by a few % would be fine, I only care about huge differences.

So I came up with the following idea: create an array of logarithmic buckets, and count data points landing in each bucket. Having the data spread across 6 orders of magnitude and guaranteed precision of 1% still leaves me with 6 * log 10 / log 1.01 =~ 1400 buckets which is perfectly fine (36 kb of memory, given current Perl's scalar size).

Counting percentiles is simple - just add up bucket counters until $sum exceeds $percentage * $total_count.

However, before I start writing actual code, I would like to ask which memory efficient statistical modules and algorithms already exist (for Perl, of maybe other languages).

Looks like there's a similar Stackoverflow question, and there's similar method proposed in one of the answers. Haven't found a ready-made Perl implementation, though.

Comment on Memory efficient statistical distribution class
Re: Memory efficient statistical distribution class
by sundialsvc4 (Monsignor) on Jun 07, 2013 at 16:06 UTC

    Your first CPAN contribution, maybe?   :-)

      I was planning on that, in case there's no similar solution already in the wild. Finally I couldn't resist and started a github project.

      For now, it does whatever Statistic::Descriptive::Sparse can plus percentiles. However, there's much room for improvement.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://1037660]
Front-paged by Corion
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others scrutinizing the Monastery: (10)
As of 2014-04-17 11:37 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    April first is:







    Results (446 votes), past polls