http://www.perlmonks.org?node_id=1030739


in reply to Re: Bag uniform distribution algorithms
in thread Bag uniform distribution algorithms

Given the nature of the input, how are you seeking to convert that to a specification of an infinite list?

What I mean to say is that there is a fundamental conflict between "uniform distribution" and a variable length list.

This is also what came to my mind when I read the specification.

The problem is somewhat similar to data compressing algorithms, which often work on complete files and thus can make full statistical analysis of the data before starting to really encode, and others which have to work on the fly with data coming on a network, for example.

I guess one way to do that is to use a sliding window mechanism, i.e. you reorganize data within a sliding window of a certain size; but whatever is no longer in the sliding window can no longer be optimized with the new data coming in. Of course, the final result is usually not as good as if the full data had been there from the onset, but you can still manage a heuristics to make things relatively close to optimal (i.e. relatively similar to what a perfect algorithm would have done with a prior knowledge of the full data set). But, of course, this can work on most usual cases, but it is also probably possible to manufacture a deviant data set where this heuristics would fail to produce good results (just as, given a compressing algorithm, it is almost always possible to produce data where the compressed result will take more place that the original one, unless of course the algorithm as an "oops, back to the original data" clause). And, of course, the size of the Window might have a considerable effect on the degree of successfulness of the heuristics. I guess that only actual test with real data can say this, it does not look as if a formal analysis can answer this question, unless possibly if we have an in-depth knowledge of the data coming in.

  • Comment on Re^2: Bag uniform distribution algorithms