Beefy Boxes and Bandwidth Generously Provided by pair Networks
Think about Loose Coupling

Memory efficient statistical distribution class

by Dallaylaen (Friar)
on Jun 07, 2013 at 11:47 UTC ( #1037660=perlquestion: print w/replies, xml ) Need Help??
Dallaylaen has asked for the wisdom of the Perl Monks concerning the following question:

I'd like to analyse some data (say, web-service response times) and get various statistical info, mainly percentiles/quantiles and presence of outstanding values.

I know about Statistics::Descriptive, however, I don't want to store all the data in memory. On the other hand, having my results off by a few % would be fine, I only care about huge differences.

So I came up with the following idea: create an array of logarithmic buckets, and count data points landing in each bucket. Having the data spread across 6 orders of magnitude and guaranteed precision of 1% still leaves me with 6 * log 10 / log 1.01 =~ 1400 buckets which is perfectly fine (36 kb of memory, given current Perl's scalar size).

Counting percentiles is simple - just add up bucket counters until $sum exceeds $percentage * $total_count.

However, before I start writing actual code, I would like to ask which memory efficient statistical modules and algorithms already exist (for Perl, of maybe other languages).

Looks like there's a similar Stackoverflow question, and there's similar method proposed in one of the answers. Haven't found a ready-made Perl implementation, though.

  • Comment on Memory efficient statistical distribution class

Replies are listed 'Best First'.
Re: Memory efficient statistical distribution class
by sundialsvc4 (Abbot) on Jun 07, 2013 at 16:06 UTC

    Your first CPAN contribution, maybe?   :-)

      I was planning on that, in case there's no similar solution already in the wild. Finally I couldn't resist and started a github project.

      For now, it does whatever Statistic::Descriptive::Sparse can plus percentiles. However, there's much room for improvement.

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://1037660]
Front-paged by Corion
[ambrus]: I hope Corion or some other admin is here and can check the logs to see what the problem is.
[mz2255]: I wish I had an online shop but sadly no. The title field definitly wasn't short, had a perl module in the title with 5-6 additional words.
[ambrus]: you can also try to just post again in case it was some intermittent error
[mz2255]: Tried again, denied. Maybe my code is just so bad the site refuses to post it.
[marto]: no, there's a bug, sometimes...
[mz2255]: ok, well thanks for looking into it. I'll try again later.

How do I use this? | Other CB clients
Other Users?
Others contemplating the Monastery: (13)
As of 2017-10-19 15:36 GMT
Find Nodes?
    Voting Booth?
    My fridge is mostly full of:

    Results (255 votes). Check out past polls.