Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl: the Markov chain saw

comment on

( #3333=superdoc: print w/replies, xml ) Need Help??
UPDATE 2 The identification of the two ends is NOT deterministic, but based on finding a pattern, where each character in the pattern could be A or T or G or C or some combination of those 2, 3 or all 4 letters. when you search for a perfect match, and when even 1 character gets randomly shuffled you cannot detect it any more, so that is straight forward, right? But here we are NOT looking for JUST perfect matches, but there exists degeneracy in each matching position, so that when you shuffle out one letter and replace it with another, it might still be a match, albeit a different one that "scores" better or worse - that makes it a bit more complicated IMO, doesn't it? This is the actual scenario. I had forgotten to mention this earlier, apologies! How does this change things when it comes to shuffling and periodicities I was referring to in Update 1? THANK YOU! :)

Does it change the probabilities: yes.

Does it mean that shuffling twice is better than shuffling once: no.

Any mix that can result from shuffling twice, could also result from shuffling once; and vice versa. So for every time when you would get a false hit after the first shuffle and not after a second; there is another case where you wouldn't get a false hit after the first shuffle and will after a second.

I'll try to word that a different way: for any given set of data, there are a huge number of possible reorderings, a (relatively) small number of which would contain a false hit. Whether you shuffle once or 10 times; the odds of you producing one of those arrangements that contains a false hit remains exactly the same.

The more fuzziness you allow, the higher the proportion of reorderings will contain a false hit. But with 3 or 4 mismatches in a 50 char set the ratio remains vanishingly small.

But statistically, there is still no benefit at all to multiple shuffles.

If my explanation isn't clear enough, perhaps running this might convince you:

#! perl -slw use strict; use Data::Dump qw[ pp ]; $Data::Dump::WIDTH = 1000; use List::Util qw[ shuffle ]; my @data = 'a'..'d'; my %once; ++$once{ join'',shuffle @data } for 1 .. 1e6; pp \%once; my %twice; ++$twice{ join'',shuffle shuffle @data } for 1 .. 1e6; pp \%twice; my %ten; ++$ten{ join'',shuffle shuffle shuffle shuffle shuffle shuffl +e shuffle shuffle shuffle shuffle @data } for 1 .. 1e6; pp \%ten ; __END__ C:\test> { abcd => 41717, abdc => 41646, acbd => 41468, acdb => 41646, adbc => +41673, adcb => 42050, bacd => 41883, badc => 41624, bcad => 41523, bc +da => 41667, bdac => 41775, bdca => 41282, cabd => 41674, cadb => 415 +98, cbad => 41587, cbda => 41892, cdab => 41650, cdba => 41706, dabc +=> 41452, dacb => 41859, dbac => 41638, dbca => 41895, dcab => 41600, + dcba => 41495 } { abcd => 41481, abdc => 41541, acbd => 41422, acdb => 41699, adbc => +41601, adcb => 41502, bacd => 41610, badc => 42086, bcad => 41860, bc +da => 41864, bdac => 41537, bdca => 41770, cabd => 41669, cadb => 420 +34, cbad => 41649, cbda => 41568, cdab => 41802, cdba => 41802, dabc +=> 41745, dacb => 41742, dbac => 41405, dbca => 41441, dcab => 41628, + dcba => 41542 } { abcd => 41723, abdc => 41512, acbd => 41613, acdb => 41633, adbc => +41587, adcb => 41547, bacd => 42015, badc => 41615, bcad => 41706, bc +da => 41752, bdac => 41903, bdca => 41539, cabd => 41306, cadb => 420 +37, cbad => 41673, cbda => 41579, cdab => 41767, cdba => 41582, dabc +=> 42219, dacb => 41463, dbac => 41228, dbca => 41659, dcab => 41892, + dcba => 41450 }

No matter how many times you shuffle the data, statistically, the results remain identical. Do stddev, chi2x or any other analysis you like, and the results will be the same. Use more data; more repetitions; more shuffles; they will remain the same.

With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.
I'm with torvalds on this Agile (and TDD) debunked I told'em LLVM was the way to go. But did they listen!

In reply to Re^3: Random shuffling by BrowserUk
in thread Random shuffling by onlyIDleft

Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post; it's "PerlMonks-approved HTML":

  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.
  • Log In?

    What's my password?
    Create A New User
    and the web crawler heard nothing...

    How do I use this? | Other CB clients
    Other Users?
    Others pondering the Monastery: (9)
    As of 2020-10-26 22:30 GMT
    Find Nodes?
      Voting Booth?
      My favourite web site is:

      Results (254 votes). Check out past polls.