Given the nature of the input, how are you seeking to convert that to a specification of an infinite list?

What I mean to say is that there is a fundamental conflict between "uniform distribution" and a variable length list.

This is also what came to my mind when I read the specification.

The problem is somewhat similar to data compressing algorithms, which often work on complete files and thus can make full statistical analysis of the data before starting to really encode, and others which have to work on the fly with data coming on a network, for example.

I guess one way to do that is to use a sliding window mechanism, i.e. you reorganize data within a sliding window of a certain size; but whatever is no longer in the sliding window can no longer be optimized with the new data coming in. Of course, the final result is usually not as good as if the full data had been there from the onset, but you can still manage a heuristics to make things relatively close to optimal (i.e. relatively similar to what a perfect algorithm would have done with a prior knowledge of the full data set). But, of course, this can work on most usual cases, but it is also probably possible to manufacture a deviant data set where this heuristics would fail to produce good results (just as, given a compressing algorithm, it is almost always possible to produce data where the compressed result will take more place that the original one, unless of course the algorithm as an "oops, back to the original data" clause). And, of course, the size of the Window might have a considerable effect on the degree of successfulness of the heuristics. I guess that only actual test with real data can say this, it does not look as if a formal analysis can answer this question, unless possibly if we have an in-depth knowledge of the data coming in.

In reply to Re^2: Bag uniform distribution algorithms by Laurent_R
in thread Bag uniform distribution algorithms by davido

Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post; it's "PerlMonks-approved HTML":

  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.