Beefy Boxes and Bandwidth Generously Provided by pair Networks
There's more than one way to do things
 
PerlMonks  

Boolean math: Fill in the blanks.

by BrowserUk (Pope)
on Oct 10, 2008 at 09:12 UTC ( #716397=perlquestion: print w/ replies, xml ) Need Help??
BrowserUk has asked for the wisdom of the Perl Monks concerning the following question:

If you pick a bunch of random 32-bit values and and average the number of bits set, if your PRNG is any good, the average will tend toward 16 (50% of 32).

Update: swapped AND & OR in the descriptions of the next two examples as pointed out by GrandFather++

If you pick another set, but this time bit-wise AND each pair of values together, and this time average the number of bits set in the results, the average will tend toward 8 (25%).

Do the same thing once more, this time bit-wise ORing each pair and the average number of bits in the results will tend toward 24 (75%).

By extending the number of values you combine to derive each result, whilst varying the sequence of AND & OR operations you use to combine them, you can derive sequences of values who's average number of bits tends toward each of the ordinal values 1 through 31. For example, using R to represent a random unsigned integer in the range 0 .. 2**32, you can derive an average that tends toward 1 by combining 5 R using AND:

#! perl -slw use strict; use Math::Random::MT qw[ rand ]; sub R(){ rand( 0xFFFFFFFF ) } our $N ||= 1000; my $t = 0; for ( 1 .. $N ) { $t += unpack '%32b*', pack 'V', R & R & R & R & R; } print $t / $N; __END__ C:\test>junk7 1.029 C:\test>junk7 1.019 C:\test>junk7 1.014 C:\test>junk7 1.031 C:\test>junk7 0.985 C:\test>junk7 0.964

If you alternatively OR 5 R together(substitute pack 'V', R | R | R | R | R; above), the average tends toward 31:

C:\test>junk7 30.979 C:\test>junk7 31.028 C:\test>junk7 31.023

I've found some patterns in this. For example, to obtain the low order odd averages, you take the equation for double the value and apply & R to it's result. eg. to get 14, I have R & R | R & R, so to get 7 I use ( R & R | R & R ) & R. but that runs out for odd values greater than 15.

I've managed to find most of them through a combination of recognising some patterns and trial and error:

00 0.00% => 0 1 3.12% => R & R & R & R & R 2 6.25% => R & R & R & R 3 9.37% => ( R | R ) & R & R & R 4 12.50% => R & R & R 5 15.62% => ( R & R | R ) & R & R 6 18.75% => (R | R) & R & R 7 21.87% => ( R & R | R & R ) & R 8 25.00% => R & R 9 28.12% => (R & R & R | R ) & R ## Originally omitted. 10 31.25% => ( R & R | R ) & R 11 34.37% => ( ( R | R ) & R | R ) & R 12 37.50% => ( R | R ) & R 13 40.62% => ( R & R | R | R ) & R 14 43.75% => R & R | R & R 15 46.87% => ( R | R | R | R ) & R 16 50.00% => R 17 53.12% => R & R & R & R | R 18 56.25% => R & R & R | R 19 59.37% => R & R & ( R | R ) | R 20 62.50% => R & R | R 21 65.62% => ( R & R | R ) & R | R ## By [psini] 22 68.75% => R & R | R | R ## By [psini] 23 71.87% => ( R & R | R & R ) | R ## By [hawtin] 24 75.00% => R | R 25 78.12% => R & R & R | R | R 26 81.25% => R & R | R | R 27 84.37% => ( R | R ) & R | R | R ## By [psini] 28 87.50% => R | R | R 29 90.62% => R & R | R | R | R 30 93.75% => R | R | R | R 31 96.87% => R | R | R | R | R 32 100.00% => 0xffffffff

but 3 values, 21, 23, & 27 elude me. Update: I now have the 'full set', but there is an auxillary, and much harder challenge in Re^4: Boolean math: Fill in the blanks. for those will real mathematical & perl aptitude!

Firstly, I wonder if anyone can fill in those blanks?

Secondly, and more intellectually satisfying, is can anyone see an overall pattern that they can describe?

Finally, I do have a real use for this. It is not just intellectual curiosity. Thanks.


Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.

Comment on Boolean math: Fill in the blanks.
Select or Download Code
Re: Boolean math: Fill in the blanks.
by psini (Deacon) on Oct 10, 2008 at 09:53 UTC

    I think that a solution for 21 is ( R | R ) & ( R | R | R ), but the method I used is not directly extensible to the other two cases.

    Let's say that En is an expression formed or'ing and and'ing R's together, and let's call P(En) the probability that a bit is 1. So P(R)=1/2; P(R|R)=3/4 and so on. So we find that:

    • P(E1 & E2) = P(E1)*P(E2)
    • P(E1 | E2) = P(E1)+P(E2)-P(E1)*P(E2)

    So if you want P(Ex)=21/32 it's easy, because 21/32=7/8*3/4 and I know (from your list) that P(R|R|R)=28/32=7/8 P(R|R)=24/32=3/4.

    But it's not applicable to 23/32 (23 being prime) nor to 27/32 for you can'e decompose this fraction in the product of integer fractions of value<1.

    Rule One: "Do not act incautiously when confronting a little bald wrinkly smiling man."

        Solution for 27 would be easy: 27=32-5; 5*2=10; 22=32-10; so P(Ex|R)=27/32 if P(Ex)=22.

        But we don't have a solution for 22, because yours is wrong :)

        Update: Got it. 22=(R|R)&R|R; 27=(R|R)&R|R|R

        Rule One: "Do not act incautiously when confronting a little bald wrinkly smiling man."

      I totally agree with you on these formulas.

      Using the second one, I get 23/32 = 28/32 - 5/32 = 1/4 + 5/8 - 5/32, therefore 23 => R & R | (R & R | R).

      I'll try this with 27 as well...
      UPDATE: 27/32 = 3/4 + 3/8 - 9/32, therefore 27 => R | R | (R | R) & R

Re: Boolean math: Fill in the blanks.
by hawtin (Prior) on Oct 10, 2008 at 09:54 UTC

    Since & halves the number of 1s and | halves the number of 0s one would expect that the pattern for 1-X would just be the pattern for X with all the operators inverted:

    21 = (32 - 11) => ( ( R & R ) | R & R ) | R 27 = (32 - 5) => ( R | R & R ) | R | R

    I see that 9 is also missing from your list.

    I presume that the goal is to get exact ratios with the smallest number of operators? One way to explore this space is to create a Perl script to list the resulting spread for all the expressions with a certain number of operators

    Update: I think this is about right (and if not then it should be fixable)

    use strict; use warnings; sub operations { my($num_ops) = @_; #return (["M | M", 3 , 4], ["M & M", 1, 4]) # if($num_ops == 1); return ["M", 1, 2] if($num_ops == 0); my @result; for(my $i=0;$i<$num_ops;$i++) { my @first = operations($i); my @second = operations($num_ops - $i - 1); foreach my $first_part (@first) { my($fp_expr,$fp_top,$fp_bot) = @{$first_part}; foreach my $second_part (@second) { my($sp_expr,$sp_top,$sp_bot) = @{$second_part}; # Two choices & or | # (NY + M(X-N))/(X*Y) # (X*Y - ((X-N)Y + (Y-M)N))/(X*Y) push @result,["($fp_expr) & ($sp_expr)", $fp_bot*$sp_bot - ($fp_bot - $fp_top)*$sp_bot - ($sp_bot - $sp_top)*$fp_top, $fp_bot*$sp_bot], ["($fp_expr) | ($sp_expr)", $fp_top*$sp_bot + $sp_top*($fp_bot - $fp_top), $fp_bot*$sp_bot]; } } } return @result; } # Update: Edited the calling routine to provide easier to use results my @got; foreach my $count_expr (0..4) { foreach my $expr (operations($count_expr)) { my($sp_expr,$top,$bot) = @{$expr}; $top = $top * (32 / $bot); next if(defined $got[$top]); $sp_expr =~ s/\(M\)/M/g; $got[$top] = sprintf "%02d: $sp_expr\n",$top; } } for(my $i=1;$i<32;$i++) { print $got[$i]; }

    Which gives results:

    01: M & (M & (M & (M & M))) 02: M & (M & (M & M)) 03: M & (M & (M & (M | M))) 04: M & (M & M) 05: M & (M & (M | (M & M))) 06: M & (M & (M | M)) 07: M & (M & (M | (M | M))) 08: M & M 09: M & (M | (M & (M & M))) 10: M & (M | (M & M)) 11: M & (M | (M & (M | M))) 12: M & (M | M) 13: M & (M | (M | (M & M))) 14: M & (M | (M | M)) 15: M & (M | (M | (M | M))) 16: M 17: M | (M & (M & (M & M))) 18: M | (M & (M & M)) 19: M | (M & (M & (M | M))) 20: M | (M & M) 21: M | (M & (M | (M & M))) 22: M | (M & (M | M)) 23: M | (M & (M | (M | M))) 24: M | M 25: M | (M | (M & (M & M))) 26: M | (M | (M & M)) 27: M | (M | (M & (M | M))) 28: M | (M | M) 29: M | (M | (M | (M & M))) 30: M | (M | (M | M)) 31: M | (M | (M | (M | M)))

      Sorry, but it's not so simple. <s>& halves the number of 1's only if both expression are identical, same thing for |<s>.

      In fact, if I'm right, your expression for 21 gives an average of 23 ones and the expression for 27 gives an average of 29.

      Update: on second sight, your solution for 27 is equivalent to that given by BrowserUk for 29

      Update: & doesn't half the number of 1's. The result is the product divided by 32, so it halves the number of 1's of an expression if and only if the other is R

      Rule One: "Do not act incautiously when confronting a little bald wrinkly smiling man."

        What I meant to convey is that if you replace all 0s with 1s and all & with | then you get the same result. If the operations are done in the same order then, I think my expression gives the results I have stated. The problem is that precedence is changing the way Perl combines the operators. When defining these expressions I should have put brackets round everything.

      1. 21 = (32 - 11) => ( ( R & R ) | R & R ) | R actually gives 23 (which is good :)
      2. 27 = (32 - 5) => ( R | R & R ) | R | R actually gives 29 which I alrady have.
      One way to explore this space is to create a Perl script to list the resulting spread for all the expressions with a certain number of operators

      I had a go at this, but precedence make is pretty aakward. (Eg, I was unsuccessful!)

      I see that 9 is also missing from your list.

      I'd missed that, but tit is an easy one: 18 = R & R & R | R so 9 = ( R & R & R | R ) & R. I'll add it in the OP.


      Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
      "Science is about questioning the status quo. Questioning authority".
      In the absence of evidence, opinion is indistinguishable from prejudice.

      Your code wasn't here the first time I replied, and having failed to achieve results with my own attempt I was dismissive. For which I apologise.

      This is brilliant! I combined your generator with my test harness and got:

      Now I going to look into extending that to produce a custom sub to approximate any given probability. Many thanks!


      Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
      "Science is about questioning the status quo. Questioning authority".
      In the absence of evidence, opinion is indistinguishable from prejudice.
Re: Boolean math: Fill in the blanks.
by oshalla (Deacon) on Oct 10, 2008 at 11:37 UTC

    What fun !

    Since you can generate 11, 9 & 5 out of 32, you could generate 21, 23 & 27 out of 32 by ~.

      Since you can generate 11, 9 & 5 out of 32, you could generate 21, 23 & 27 out of 32 by ~.

      I did consider using ~, and did get some results, but I had a hard enough time wrapping my brain around 2 levels of precedence, never mind 3!

      Just as I believe that the 19th century scholars that gave us our grammar and spelling "rules" did us all a dis-service, I think that the ,athematicians that gave us (arbitrary) rules of precedence did likewise.

      Expressions would just be easier to deal with, if they were alway evauated left to right.


      Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
      "Science is about questioning the status quo. Questioning authority".
      In the absence of evidence, opinion is indistinguishable from prejudice.
Re: Boolean math: Fill in the blanks.
by JavaFan (Canon) on Oct 10, 2008 at 20:15 UTC
    I'm curious about the first and last entry in the list. If it's acceptable to take 0 and 0xffffffff for 0 and 32 bits, why not use 1 for 1 bit, 3 for 2 bits, 7 for 3 bits, etc?

      This falls out of--is an extension to--the code in Re: A series of random number and others (10MB / 8seconds).

      There the problem was to fairly pick 50% of a set. By setting the bits of a bit vector, where each bit == 1 choice, to randomly generated bit patterns, I very quickly achieve a close approximation to a 50% pick. Then it is just a case of randomly inverting bits until the required 50% ratio is achieved.

      Works great for 50%, but as the desired ratio moves away from 50% (or 0% or 100%), the number of bits requiring correction increases and the chances of picking an appropriate bit reduces, so the number of iterations required to achieve the required ratio increases exponentially.

      Then I remember the affect of ANDing and ORing two random values produces 25% & 75% repectively. I used it to produce random fill patterns for graphics, where it gives a more naturalistic shading affect with less aliasing than a fixed fill pattern.

      That allows me to initialise the bit-vector to 0%, 25%, 50%, 75% or 100%, whichever is closest to the required ratio, very quickly. So the number of iterations needed to achieve the correction falls from max 1/4 of the total range to 1/8th. And the more you can subdivide the range, the closer you can get with the fast initialisation and less correction is required. With 128 subdivisions needing just 7 terms, it possible to get within less that 0.4% through the initialisation.

      So, to answer the question, they are acceptable because they are only a starting point. If the required ratio was 99.99999% (in a large set), it's quicker to start with 100% set and unset bits randomly until I achive the desired ratio, than start at 75% and go the other way. Same thing at the other end. If you're trying to pick 3 from a billion, initialise to a billion 0s and then set 3 random bits.

      It'd be a stupid way to pick 3 from a billion, but it makes for completeness. The algorithm really comes into its own when you starting selecting 10s of millions from billions.


      Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
      "Science is about questioning the status quo. Questioning authority".
      In the absence of evidence, opinion is indistinguishable from prejudice.
Re: Boolean math: Fill in the blanks.
by repellent (Priest) on Oct 12, 2008 at 01:55 UTC
    Very interesting topic!

    I see an overall pattern, which I would like to share :-)

    First, a few basic observations:
    • There seems to be a certain symmetry evident from how R & R results in 8 bits set on average (halfway between 1 and 16) versus R | R resulting in 24 bits set on average (halfway between 16 and 32). I'll be sure to make my hypothesis symmetrical, in that regard.
    • If we apply AND, the average bits set goes down due to higher likelihood of 0 bits swallowing the 1 bits. Conversely, if we apply OR, the average bits set goes up due to higher likelihood of 1 bits overriding the 0 bits.
    • Non-associative rules apply for AND and OR:

        ( (A & B) | C )  !=  ( A & (B | C) )

      So evaluating the expression from left-to-right is important! Update: Perl gives precedence to the & operator over |


    So, given the following legend:
      n = total # bits per sample L1 = # bits set on average on left-side of AND/OR L0 = # bits unset on average on left-side of AND/OR = n - L1 R1 = # bits set on average on right-side of AND/OR

    we can derive a set of equations to calculate the resulting average bits set for each AND/OR operation:
      AND : L1 * R1 / n OR : L1 + L0 * R1 / n = L1 + (n - L1) * R1 / n = L1 + R1 - L1 * R1 / n = L1 * ( 1 - R1 / n ) + R1

      Conclusion: ... I see some inconsistent results ... 7 of ( R & R | R & R ) & R evaluates to 5 bits set on average.

      If you plug ( R & R | R & R ) & R into the OP code and run it a few times, you'll see that the average does approximate 7:

      C:\test>booleanBuk -N=1e2 7.02 C:\test>booleanBuk -N=1e2 6.88 C:\test>booleanBuk -N=1e2 7.06 C:\test>booleanBuk -N=1e2 7.06 C:\test>booleanBuk -N=1e2 6.92 C:\test>booleanBuk -N=1e2 7.08

      And as you increase the number of iterations, the more closely the observations coincide with the theoretical value:

      C:\test>booleanBuk -N=1e5 7.00138 C:\test>booleanBuk -N=1e5 6.98986 C:\test>booleanBuk -N=1e5 6.99812 C:\test>booleanBuk -N=1e5 7.00729 C:\test>booleanBuk -N=1e6 6.994402 C:\test>booleanBuk -N=1e6 7.002038

      That's how I like my proofs. Tangible :)

      Sorry if I have misunderstood your hypothesis.


      Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
      "Science is about questioning the status quo. Questioning authority".
      In the absence of evidence, opinion is indistinguishable from prejudice.
        But plug in
          (( R & R ) | R ) & R & R

        into the OP code and it tends towards 5, which was how I was interpreting the expression:
          ( R & R | R & R ) & R

        My mistake. Perl evaluates ( R & R | R & R ) & R as ( (R & R) | (R & R) ) & R and that, according to my hypothesis, results in 7 bits set on average.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://716397]
Approved by GrandFather
Front-paged by Old_Gray_Bear
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others musing on the Monastery: (7)
As of 2014-12-23 01:58 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    Is guessing a good strategy for surviving in the IT business?





    Results (133 votes), past polls