Beefy Boxes and Bandwidth Generously Provided by pair Networks
"be consistent"
 
PerlMonks  

Re^7: Hash order randomization is coming, are you ready?

by demerphq (Chancellor)
on Dec 03, 2012 at 10:59 UTC ( #1006847=note: print w/ replies, xml ) Need Help??


in reply to Re^6: Hash order randomization is coming, are you ready?
in thread Hash order randomization is coming, are you ready?

Which makes me wonder whether your meditation isn't a little a) redundant; b) slightly scare mongery?

I think you missed the point. The order will change *every process*.

$ for i in {1..10}; do ./perl -le'%h=(1..20); print "$]: ",join "-", k +eys %h'; done; 5.017007: 1-13-5-15-19-9-17-11-7-3 5.017007: 13-19-5-17-9-15-1-7-3-11 5.017007: 13-7-19-15-5-1-11-17-3-9 5.017007: 17-13-3-7-15-1-9-5-11-19 5.017007: 17-9-3-11-7-15-1-19-5-13 5.017007: 19-1-11-5-9-3-15-17-7-13 5.017007: 9-19-3-17-7-11-13-15-1-5 5.017007: 1-11-15-3-19-17-7-13-9-5 5.017007: 19-7-13-1-5-17-9-3-11-15 5.017007: 5-19-9-1-13-17-7-3-15-11 $ for i in {1..10}; do perl -le'%h=(1..20); print "$]: ",join "-", key +s %h'; done; 5.012004: 11-3-7-9-17-15-1-19-13-5 5.012004: 11-3-7-9-17-15-1-19-13-5 5.012004: 11-3-7-9-17-15-1-19-13-5 5.012004: 11-3-7-9-17-15-1-19-13-5 5.012004: 11-3-7-9-17-15-1-19-13-5 5.012004: 11-3-7-9-17-15-1-19-13-5 5.012004: 11-3-7-9-17-15-1-19-13-5 5.012004: 11-3-7-9-17-15-1-19-13-5 5.012004: 11-3-7-9-17-15-1-19-13-5 5.012004: 11-3-7-9-17-15-1-19-13-5

The order returned by 5.12.4 is what you should see on pretty much every modernish perl there has been released with the exception of 5.8.1 and 5.17.6 and later. And obviously in 5.17.6 the order changes pretty much every time.

What we discover when we per-process randomize the keys is that people actually depend on the key order more than they realize. When we make it random these dependencies become visible as bugs. I tend to consider them buggy originally, as minor changes to the history of the hash will produce roughly the same results as per-process randomization.

BTW, you *did* see that I said "none of this is new" right? So why the emphasis on "But that has always been the case"?

---
$world=~s/war/peace/g


Comment on Re^7: Hash order randomization is coming, are you ready?
Download Code
Re^8: Hash order randomization is coming, are you ready?
by jdporter (Canon) on Dec 03, 2012 at 13:27 UTC
    none of this is new

    "___ is coming, are you ready" implies quite the opposite, so I think we can be forgiven for being ... confused.

    I reckon we are the only monastery ever to have a dungeon stuffed with 16,000 zombies.

      Well I did say "per-process". :-) Anyway, hopefully it is clear now.

      ---
      $world=~s/war/peace/g

Re^8: Hash order randomization is coming, are you ready?
by BrowserUk (Pope) on Dec 03, 2012 at 20:08 UTC
    BTW, you *did* see that I said "none of this is new" right? So why the emphasis on "But that has always been the case"?

    Because, until the simple example in your latest post, all the previous examples demonstrate things that have always been true. Thus, they do not demonstrate what changed. Which when combine with the phrasing of the OP ...

    But never mind. I'm not trying to get on your case here; just work out what has actually changed, and a) how it might affect my existing code; and more importantly b) how it might affect my thought processes with regard to how I think of and use hashes.

    My conclusion so far -- for me personally; not the world in general you are addressing -- is that I have assumed the "new" constraints as a matter of course ever since the randomisation fix for Algorithmic Complexity Attack that was (breifly???) implement in 5.8.1.

    However, what would be most useful to me -- and others I'm sure -- is a description of what has actually changed internally; and why it has been changed. Are you up for providing that description?


    With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.

    RIP Neil Armstrong

      that I have assumed the "new" constraints as a matter of course ever since the randomisation fix for Algorithmic Complexity Attack that was (breifly???) implement in 5.8.1.

      Alas not everyone has been as diligent as you. :-) It is surprising how many real bugs this found.

      what has actually changed internally

      Ok, first some history. In 5.8.1 a very similar patch the one I have been working on was implemented. It broke lots of stuff, which was considered unacceptable for a minor release. So a new implementation was done. This implementation actually supported two types of hash, and two seeds, one constant determined at build time, and one random per process. By default hashes would use the constant seed, but when Perl noticed too many collisions in a bucket it would trigger a "rehash" using a random per-process seed, which would cause the hash value of all of its keys to be reclaculatied and would as a byproduct cause the hash'es keys to be removed from the shared string table.

      All of this consumed processing time, and added code complexity.

      5.17.6 returned things to roughly where they were in 5.8.1. The rehash mechanism and all overheads associated with it are removed. The hash seed is randomly initialized per process. etc.

      Somewhat related is the actual hash function in 5.17.6 is different from 5.17.5, and we probably will use a yet again different hash function in 5.18.

      And if I have my way hashes will be randomized on a per hash level as well. (So every hash would have its own order, regardless of what keys it stores or the history of the hash.

      ---
      $world=~s/war/peace/g

        5.17.6 returned things to roughly where they were in 5.8.1.

        Okay. Thanks for that. I was party to some of thr discussion for the 5.8.1 randomisation, so that makes sense to me.

        Somewhat related is the actual hash function in 5.17.6 is different from 5.17.5, and we probably will use a yet again different hash function in 5.18.

        Can you explain why the hash function has changed? And what is has changed (is going to change) to?

        A reference to background material regarding the selection and testing of the new hash functions whould be interesting and useful.

        And if I have my way hashes will be randomized on a per hash level as well.

        Could you briefly explain why you would do that? What it would achieve or prevent?


        With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
        Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
        "Science is about questioning the status quo. Questioning authority".
        In the absence of evidence, opinion is indistinguishable from prejudice.

        RIP Neil Armstrong

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://1006847]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others drinking their drinks and smoking their pipes about the Monastery: (9)
As of 2014-11-29 02:30 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    My preferred Perl binaries come from:














    Results (203 votes), past polls