Beefy Boxes and Bandwidth Generously Provided by pair Networks
Problems? Is your data what you think it is?
 
PerlMonks  

Re^6: Hash order randomization is coming, are you ready?

by BrowserUk (Pope)
on Dec 02, 2012 at 17:41 UTC ( #1006737=note: print w/ replies, xml ) Need Help??


in reply to Re^5: Hash order randomization is coming, are you ready?
in thread Hash order randomization is coming, are you ready?

First, thanks for the clarification.

However, as far as I can tell, what you are saying comes down to:

Two hashes containing identical keys and values, will iterate in different orders, unless they were constructed in exactly the same way.

For example:

$h1{ $_ } = 1 for 'a'..'z';; $h2{ $_ } = 1 for reverse 'a'..'z';; print %h1; print %h2;; w 1 r 1 a 1 x 1 d 1 j 1 y 1 u 1 k 1 h 1 g 1 f 1 t 1 i 1 e 1 n 1 v 1 m +1 s 1 l 1 c 1 p 1 q 1 b 1 z 1 o 1 w 1 a 1 r 1 d 1 x 1 j 1 y 1 u 1 h 1 k 1 g 1 f 1 i 1 t 1 e 1 n 1 v 1 m +1 s 1 l 1 c 1 p 1 b 1 q 1 z 1 o 1

And:

@h1{ 'a'..'z', 'A'..'Z' } = (1)x52;; delete @h1{ 'A'..'Z' };; @h2{ 'a'..'z' } = (1)x26;; print %h1; print %h2;; a 1 d 1 j 1 y 1 u 1 k 1 g 1 t 1 e 1 v 1 s 1 c 1 q 1 b 1 z 1 w 1 r 1 x +1 h 1 f 1 i 1 n 1 m 1 l 1 p 1 o 1 w 1 r 1 a 1 x 1 d 1 j 1 y 1 u 1 k 1 h 1 g 1 f 1 t 1 i 1 e 1 n 1 v 1 m +1 s 1 l 1 c 1 p 1 q 1 b 1 z 1 o 1

And:

@h{ 'a'..'z', 'A'..'Z' } = (1)x52;; delete @h{ 'A'..'Z' };; %h2 = %h;; print %h; print %h2;; a 1 d 1 j 1 y 1 u 1 k 1 g 1 t 1 e 1 v 1 s 1 c 1 q 1 b 1 z 1 w 1 r 1 x +1 h 1 f 1 i 1 n 1 m 1 l 1 p 1 o 1 w 1 r 1 a 1 x 1 d 1 j 1 y 1 u 1 h 1 k 1 g 1 f 1 i 1 t 1 e 1 n 1 m 1 v +1 s 1 l 1 p 1 c 1 q 1 b 1 z 1 o 1

In all cases above, two "identical" hashes were arrived at through a different sequence of operations; and that difference in the sequence of construction manifests itself in a different iteration sequence.

But that has always been the case!

The above is 5.10; but the same is also true going right back to my involvement with perl: 5.6.1.

Which makes me wonder whether your meditation isn't a little a) redundant; b) slightly scare mongery?

Please don't take that the wrong way; I'm simply trying to understand exactly what difference(s) the latest changes have actually made?


With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.

RIP Neil Armstrong


Comment on Re^6: Hash order randomization is coming, are you ready?
Select or Download Code
Re^7: Hash order randomization is coming, are you ready?
by demerphq (Chancellor) on Dec 03, 2012 at 10:59 UTC

    Which makes me wonder whether your meditation isn't a little a) redundant; b) slightly scare mongery?

    I think you missed the point. The order will change *every process*.

    $ for i in {1..10}; do ./perl -le'%h=(1..20); print "$]: ",join "-", k +eys %h'; done; 5.017007: 1-13-5-15-19-9-17-11-7-3 5.017007: 13-19-5-17-9-15-1-7-3-11 5.017007: 13-7-19-15-5-1-11-17-3-9 5.017007: 17-13-3-7-15-1-9-5-11-19 5.017007: 17-9-3-11-7-15-1-19-5-13 5.017007: 19-1-11-5-9-3-15-17-7-13 5.017007: 9-19-3-17-7-11-13-15-1-5 5.017007: 1-11-15-3-19-17-7-13-9-5 5.017007: 19-7-13-1-5-17-9-3-11-15 5.017007: 5-19-9-1-13-17-7-3-15-11 $ for i in {1..10}; do perl -le'%h=(1..20); print "$]: ",join "-", key +s %h'; done; 5.012004: 11-3-7-9-17-15-1-19-13-5 5.012004: 11-3-7-9-17-15-1-19-13-5 5.012004: 11-3-7-9-17-15-1-19-13-5 5.012004: 11-3-7-9-17-15-1-19-13-5 5.012004: 11-3-7-9-17-15-1-19-13-5 5.012004: 11-3-7-9-17-15-1-19-13-5 5.012004: 11-3-7-9-17-15-1-19-13-5 5.012004: 11-3-7-9-17-15-1-19-13-5 5.012004: 11-3-7-9-17-15-1-19-13-5 5.012004: 11-3-7-9-17-15-1-19-13-5

    The order returned by 5.12.4 is what you should see on pretty much every modernish perl there has been released with the exception of 5.8.1 and 5.17.6 and later. And obviously in 5.17.6 the order changes pretty much every time.

    What we discover when we per-process randomize the keys is that people actually depend on the key order more than they realize. When we make it random these dependencies become visible as bugs. I tend to consider them buggy originally, as minor changes to the history of the hash will produce roughly the same results as per-process randomization.

    BTW, you *did* see that I said "none of this is new" right? So why the emphasis on "But that has always been the case"?

    ---
    $world=~s/war/peace/g

      none of this is new

      "___ is coming, are you ready" implies quite the opposite, so I think we can be forgiven for being ... confused.

      I reckon we are the only monastery ever to have a dungeon stuffed with 16,000 zombies.

        Well I did say "per-process". :-) Anyway, hopefully it is clear now.

        ---
        $world=~s/war/peace/g

      BTW, you *did* see that I said "none of this is new" right? So why the emphasis on "But that has always been the case"?

      Because, until the simple example in your latest post, all the previous examples demonstrate things that have always been true. Thus, they do not demonstrate what changed. Which when combine with the phrasing of the OP ...

      But never mind. I'm not trying to get on your case here; just work out what has actually changed, and a) how it might affect my existing code; and more importantly b) how it might affect my thought processes with regard to how I think of and use hashes.

      My conclusion so far -- for me personally; not the world in general you are addressing -- is that I have assumed the "new" constraints as a matter of course ever since the randomisation fix for Algorithmic Complexity Attack that was (breifly???) implement in 5.8.1.

      However, what would be most useful to me -- and others I'm sure -- is a description of what has actually changed internally; and why it has been changed. Are you up for providing that description?


      With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
      Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
      "Science is about questioning the status quo. Questioning authority".
      In the absence of evidence, opinion is indistinguishable from prejudice.

      RIP Neil Armstrong

        that I have assumed the "new" constraints as a matter of course ever since the randomisation fix for Algorithmic Complexity Attack that was (breifly???) implement in 5.8.1.

        Alas not everyone has been as diligent as you. :-) It is surprising how many real bugs this found.

        what has actually changed internally

        Ok, first some history. In 5.8.1 a very similar patch the one I have been working on was implemented. It broke lots of stuff, which was considered unacceptable for a minor release. So a new implementation was done. This implementation actually supported two types of hash, and two seeds, one constant determined at build time, and one random per process. By default hashes would use the constant seed, but when Perl noticed too many collisions in a bucket it would trigger a "rehash" using a random per-process seed, which would cause the hash value of all of its keys to be reclaculatied and would as a byproduct cause the hash'es keys to be removed from the shared string table.

        All of this consumed processing time, and added code complexity.

        5.17.6 returned things to roughly where they were in 5.8.1. The rehash mechanism and all overheads associated with it are removed. The hash seed is randomly initialized per process. etc.

        Somewhat related is the actual hash function in 5.17.6 is different from 5.17.5, and we probably will use a yet again different hash function in 5.18.

        And if I have my way hashes will be randomized on a per hash level as well. (So every hash would have its own order, regardless of what keys it stores or the history of the hash.

        ---
        $world=~s/war/peace/g

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://1006737]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others romping around the Monastery: (7)
As of 2014-07-12 13:18 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    When choosing user names for websites, I prefer to use:








    Results (240 votes), past polls