in reply to Re^2: Random shuffling

in thread Random shuffling

UPDATE 2 The identification of the two ends is NOT deterministic, but based on finding a pattern, where each character in the pattern could be A or T or G or C or some combination of those 2, 3 or all 4 letters. when you search for a perfect match, and when even 1 character gets randomly shuffled you cannot detect it any more, so that is straight forward, right? But here we are NOT looking for JUST perfect matches, but there exists degeneracy in each matching position, so that when you shuffle out one letter and replace it with another, it might still be a match, albeit a different one that "scores" better or worse - that makes it a bit more complicated IMO, doesn't it? This is the actual scenario. I had forgotten to mention this earlier, apologies! How does this change things when it comes to shuffling and periodicities I was referring to in Update 1? THANK YOU! :)

Does it change the probabilities: yes.

Does it mean that shuffling twice is better than shuffling once: no.

Any mix that can result from shuffling twice, could also result from shuffling once; and vice versa. So for every time when you would get a false hit after the first shuffle and not after a second; there is another case where you wouldn't get a false hit after the first shuffle and will after a second.

I'll try to word that a different way: for any given set of data, there are a huge number of possible reorderings, a (relatively) small number of which would contain a false hit. Whether you shuffle once or 10 times; the odds of you producing one of those arrangements that contains a false hit remains exactly the same.

The more fuzziness you allow, the higher the proportion of reorderings will contain a false hit. But with 3 or 4 mismatches in a 50 char set the ratio remains vanishingly small.

But statistically, there is still no benefit at all to multiple shuffles.

If my explanation isn't clear enough, perhaps running this might convince you:

#! perl -slw use strict; use Data::Dump qw[ pp ]; $Data::Dump::WIDTH = 1000; use List::Util qw[ shuffle ]; my @data = 'a'..'d'; my %once; ++$once{ join'',shuffle @data } for 1 .. 1e6; pp \%once; my %twice; ++$twice{ join'',shuffle shuffle @data } for 1 .. 1e6; pp \%twice; my %ten; ++$ten{ join'',shuffle shuffle shuffle shuffle shuffle shuffl +e shuffle shuffle shuffle shuffle @data } for 1 .. 1e6; pp \%ten ; __END__ C:\test>shuffleStats.pl { abcd => 41717, abdc => 41646, acbd => 41468, acdb => 41646, adbc => +41673, adcb => 42050, bacd => 41883, badc => 41624, bcad => 41523, bc +da => 41667, bdac => 41775, bdca => 41282, cabd => 41674, cadb => 415 +98, cbad => 41587, cbda => 41892, cdab => 41650, cdba => 41706, dabc +=> 41452, dacb => 41859, dbac => 41638, dbca => 41895, dcab => 41600, + dcba => 41495 } { abcd => 41481, abdc => 41541, acbd => 41422, acdb => 41699, adbc => +41601, adcb => 41502, bacd => 41610, badc => 42086, bcad => 41860, bc +da => 41864, bdac => 41537, bdca => 41770, cabd => 41669, cadb => 420 +34, cbad => 41649, cbda => 41568, cdab => 41802, cdba => 41802, dabc +=> 41745, dacb => 41742, dbac => 41405, dbca => 41441, dcab => 41628, + dcba => 41542 } { abcd => 41723, abdc => 41512, acbd => 41613, acdb => 41633, adbc => +41587, adcb => 41547, bacd => 42015, badc => 41615, bcad => 41706, bc +da => 41752, bdac => 41903, bdca => 41539, cabd => 41306, cadb => 420 +37, cbad => 41673, cbda => 41579, cdab => 41767, cdba => 41582, dabc +=> 42219, dacb => 41463, dbac => 41228, dbca => 41659, dcab => 41892, + dcba => 41450 }

No matter how many times you shuffle the data, statistically, the results remain identical. Do stddev, chi2x or any other analysis you like, and the results will be the same. Use more data; more repetitions; more shuffles; they will remain the same.

Comment onRe^3: Random shufflingDownloadCode