The "element" I am identifying in the genome has two ends. These two ends are each around 50-100 long. The separation between these ends is between 200 and 20,000. The length of the ends, and the separation between them is NOT known a priori. So, I wonder if there are two size ranges or periodicities whose non randomness needs to be broken down, with two successive shuffles, one in the ~ 100 length window and again in a ~1000 length window? Or some other window lengths?


Let's start with the minimum length:

50 + 200 + 50. There are 300! = 3.0605751221644063603537046129727e+614 possible outcomes from the shuffle.

For one of your 50 char headers or trailers to have survived the shuffle intact -- ie, to either have remained where it was or to have been 'reconstructed' at some other position the calculation goes something like: at any given position, there are 50! = 3.0414093201713378043612608166065e+64 possible ways that the original 50 chars could have been rearranged; which makes the probability that an exact 50 sequence re-appears somewhere after one shuffle, something like 50! / 300! = 9.9373784297785355338568640895281e-551. Which is as close to 'impossible' as you could wish for.

And that's for the shortest lengths. For your average lengths, you are talking all the computers that have ever existed, do exist and will ever exist, processing flat out until Universal heat death.

Of course; it could also happen tomorrow -- that's the nature of random -- but duplicating the shuffle doesn't make it any more or less likely to happen.

understand any differences between List::Util qw(shuffle) and Array::Shuffle qw(shuffle_array)

They both use the Fischer-Yates method. But ...

List::Util::shuffle takes the array, flattens it to a list on the stack, shuffles that list on the stack and returns it; where it is then assigned back to an array.

Array::Shuffle takes a reference to an array and shuffles that in-place.

The latter is faster because it avoids the array to list & list to array conversions at either end.

Either -- over a suitable sample size -- will produce similar results. The latter gets you there more quickly.

With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.
I'm with torvalds on this Agile (and TDD) debunked I told'em LLVM was the way to go. But did they listen!

In reply to Re: Random shuffling by BrowserUk
in thread Random shuffling by onlyIDleft

Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post; it's "PerlMonks-approved HTML":

  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.