http://www.perlmonks.org?node_id=146851


in reply to Re: Faster Statistics for Discrete Data
in thread Faster Statistics for Discrete Data

Thanks for the suggestions. Unfortunately, even something like Tie::IxHash would defeat the purpose of this module. If you have to preserve order, you might as well use an array since you'd need to know where every data point came in. I suppose you could do something like run-length encoding if you had long runs of the same value but the hash overhead would probably eat up the savings for all but very limited data sets. Fortunately, there's very few statistical things (at least that I'm aware of) that depend on the order of the data (The least_squares_fit method of Statistics::Descriptive is the only one I can think of off the top of my head). There are some things that require the data to be in sorted order, and for that my method works quite well since all I have to sort is the hash keys not all the values.