Syntactic Confectionery Delight PerlMonks

comment on

 Need Help??
You appear to have not fully understood several parts of what I wrote.

You are right. In as much as, even re-reading your post in its entirety, I fail to understand what this table of odds represents, if not your calculation for the use of 4x32-bit hashes:

```Odds  Inserts
----  -------
1%  ~28e6
10%  ~45e6
50%  ~65e6
90%  ~83e6
99%  ~96e6
Probably much more important is that in some replies you hint that you are using 10 hashes not 4 hashes now.

The (still) running program is using 10 hashes. But I am not yet convinced that the extra 6 hashes actually buys anything. See below.

The program has currently covered just under half of the maximum 2**32 trials and has found only 8359 collisions. The last 10 of which were at positions:

```1933560856
1933561794
1933600874
1933619074
1933633177
1933665039
1933675171
1933675769
1933690902
1933708679

I am reluctant to interrupt it before it reaches at least half way, and -- depending on the rate at which the rate of collisions increases, may let it run on to 3 billion. Hence, I cannot consider adding code to count the current bit set per vector as you described at this stage.

10 hashes

As described in my OP, I have (artificially) expanded the number of subhashes, by permuting the 4 natural 32-bit sub-divisions of the MD5 and XORing them to produce the other 6.

But, as moritz suggests here, it is not at all clear that this process actually decreases the odds of collisions, because it doesn't increase the amount of information (he calls it entropy, but I'm uncomfortable with that term in this context), available.

It is my intention to verify that during the second pass (to isolate how many of the collisions are actual dups and how many are false positives). At the same time, I will also perform the collision check using just the 4 natural hashes and that will tell me whether the permuted XORs buy me anything or not. But that will take another week or so.

If you need some help redoing the calculations

If you could distill your write-up to a formula, that would be helpful.

With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.

The start of some sanity?

In reply to Re^4: [OT] The statistics of hashing. (4<10) by BrowserUk
in thread [OT] The statistics of hashing. by BrowserUk

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post; it's "PerlMonks-approved HTML":

• Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
• Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
• Read Where should I post X? if you're not absolutely sure you're posting in the right place.
• Posts may use any of the Perl Monks Approved HTML tags:
a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
• You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
 For: Use: & & < < > > [ [ ] ]
• Link using PerlMonks shortcuts! What shortcuts can I use for linking?

Create A New User
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others contemplating the Monastery: (3)
As of 2019-10-13 20:01 GMT
Sections?
Information?
Find Nodes?
Leftovers?
Voting Booth?
In 2019 the site I miss most is:

Results (36 votes). Check out past polls.

Notices?