Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl Monk, Perl Meditation

Testing a number for oddness

by Falkkin (Chaplain)
on Jan 23, 2001 at 03:29 UTC ( #53618=perlquestion: print w/replies, xml ) Need Help??
Falkkin has asked for the wisdom of the Perl Monks concerning the following question:

Earlier today, in the CB, I talked with arturo about how to find out whether a given integer is even or odd.

The obvious way to do this involves something like: $odd = $num % 2;.

On the other hand, my CS professor claims that's a slow way of finding out the oddness of a number, and suggests something along the lines of: $odd = $num & 1; instead. This makes sense, because one would assume that a logical AND takes up less CPU time than a DIV instruction.

I decided to test this hypothesis using perl's Benchmark module, and was surprised at the results:

Here's the code:

my $t1 = timeit(1_000_000, \&odd_1); my $t2 = timeit(1_000_000, \&odd_2); print "odd_1: ",timestr($t1),"\n"; print "odd_2: ",timestr($t2),"\n"; sub odd_1 { my $num = rand(2**31); return $num % 2; } sub odd_2 { my $num = rand(2**31); return $num & 1; }
And here, the results:
[falkkin@shadow ~/perl] perl odd_1: 6 wallclock secs ( 6.29 usr + -0.00 sys = 6.29 CPU) @ 158982. +51/s (n=1000000) odd_2: 10 wallclock secs ( 9.36 usr + -0.01 sys = 9.35 CPU) @ 106951. +87/s (n=1000000)
It appears that the "slow" method of computing this is actually faster, but I have no idea why. I was wondering if any monks here who know the internals of perl could provide any insight on this.

The only causes I can think of are:
- My CS teacher's claim is simply wrong; she maybe didn't account for newer hardware that is optimized for multiplication & division (although this script is being run on a Pentium I with MMX instructions, which is hardly cutting-edge any more...)
- Perhaps perl's compiler optimizes $num % 2 into "return the last bit of $num"?

Thoughts, anyone?

Replies are listed 'Best First'.
Re: Testing a number for oddness
by Adam (Vicar) on Jan 23, 2001 at 03:52 UTC
    Your results imply a significant savings using % instead of &, which made me wonder about them. Thank you for posting your code, I ran it myself (on a dual P3-650 running NT4 with Perl 5.6) and got:
    odd_1: 2 wallclock secs ( 1.70 usr + 0.00 sys = 1.70 CPU) @ 586854. +46/s (n=1000000) odd_2: 3 wallclock secs ( 1.67 usr + 0.00 sys = 1.67 CPU) @ 598086. +12/s (n=1000000)
    Which implies that the two are roughly equivalent, which is what I expected. I ran a second test with this code (and the following results):
    #!perl -w use strict; use Benchmark; timethese( 1_000_000, { odd_1 => sub { my $num = rand(2**31); return $num % 2 }, odd_2 => sub { my $num = rand(2**31); return $num & 2 }, }) __END__ Benchmark: timing 1000000 iterations of odd_1, odd_2... odd_1: 3 wallclock secs ( 1.69 usr + 0.00 sys = 1.69 CPU) @ 59 +2417.06/s (n=1000000) odd_2: 3 wallclock secs ( 1.69 usr + 0.00 sys = 1.69 CPU) @ 59 +2417.06/s (n=1000000)
    Ok, I lied. I ran it 5 times, and that result only happened once. But the other 4 runs were also very close. Perhaps there is an optimization running, since %2 would look like "grab the bit shifted off when doing >>1". And I would be surprised if no one (from the authors of Perl on down to the folks at intel) took the time to optimize this rather common operation.

    So I was thinking about this some more, and thought, "I wonder what the deparser has to say about this." Turns out, nothing. Sigh.

    >perl -MO=Deparse -we "$n=1231234;print $n % 2" $n = 1231234; print $n % 2; -e syntax OK >perl -MO=Deparse -we "$n=1231234;print $n & 1" $n = 1231234; print $n & 1; -e syntax OK >perl -MO=Deparse,-p -we "$n=1231234;print $n % 2" ($n = 1231234); print(($n % 2)); -e syntax OK >perl -MO=Deparse,-p -we "$n=1231234;print $n & 1" ($n = 1231234); print(($n & 1)); -e syntax OK
Re: Testing a number for oddness
by runrig (Abbot) on Jan 23, 2001 at 04:30 UTC
    Whenever running a benchmark, you should, as much as possible, take out things that you are not comparing and use functions/operations that are as low cost as possible (in comparison to what you're comparing). That call to rand() is slowing things down alot, and if you remove it, you get a better comparison:
    #!/usr/local/bin/perl -w use strict; use Benchmark; my $i = 0; timethese(200000, { RAND_AND=>\&and_rand, RAND_MOD=>\&mod_rand, TWO_AND=>\&two_and, TWO_MOD=>\&two_mod, }); sub and_rand { my $num = rand(2**31); return $num & 1; } sub mod_rand { my $num = rand(2**31); return $num % 2; } sub two_mod { $i++ % 2; } sub two_and { $i++ & 1; } ~ "tst" 31 lines, 359 characters ~/tst>./tst Benchmark: timing 200000 iterations of RAND_AND, RAND_MOD, TWO_AND, TW +O_MOD... RAND_AND: 1 wallclock secs ( 1.12 usr + 0.00 sys = 1.12 CPU) @ 17 +8571.43/s (n=200000) RAND_MOD: 0 wallclock secs ( 1.10 usr + 0.00 sys = 1.10 CPU) @ 18 +1818.18/s (n=200000) TWO_AND: 0 wallclock secs ( 0.36 usr + -0.01 sys = 0.35 CPU) @ 57 +1428.57/s (n=200000) (warning: too few iterations for a reliable count) TWO_MOD: 2 wallclock secs ( 0.42 usr + -0.00 sys = 0.42 CPU) @ 47 +6190.48/s (n=200000) ~/tst>
    Update: Even the ++ in the other two subroutines is a sort of unwanted operator. If I take it out and set $i to a constant, the AND method is almost twice as fast as the MOD method (ok, on second checking maybe that twice as fast was a fluke, its more like 30-40% faster consistently).
      On constants, the AND may be getting optimized away, and in checking I find that it is...
      # perl -MO=Deparse,-p -we "print 15 & 1" print(1); -e syntax OK # perl -MO=Deparse,-p -we "print 16 & 1" print(0); -e syntax OK

      $you = new YOU;
      honk() if $you->love(perl)

        It turns out that almost all operators in Perl get optimized away in this manner. 'Constant folding', as it's called, is a very common optimization.
        % perl -MO=Deparse print 15 + 1, 15 - 1, 15 * 1, 15 % 1, 15 & 1, 15 ^ 1, 15 << 1, !15, 15 || $x, 15 . 1, 15 < 1, 15 <=> 1, 15 ? $x : $y; __END__ print 16, 14, 15, 0, 1, 14, 30, 0, 15, '151', 0, 1, $x;
        In fact, the only operator I've found that could have this optimization but doesn't is x.
        That's why I set '$i' to a constant (my $i=15), then AND'ed it. I checked, and THAT won't get optimized away.
Re: Testing a number for oddness
by Fastolfe (Vicar) on Jan 23, 2001 at 05:02 UTC
    Update: chipmunk notes that I was using % 1 instead of % 2, which was a stupid mistake on my part. I've updated the code and benchmarks, but the results do not differ significantly.

    Because I'm bored and in love with Inline:

    use Benchmark 'timethese'; use Inline C => <<'EoF'; int c_and(int num) { return(num & 1); } int c_mod(int num) { return(num % 2); } EoF my $num = 5; sub perl_mod { $_[0] % 2 } sub perl_and { $_[0] & 1 } timethese(10_000_000, { perl_mod => sub { perl_mod($num) }, perl_and => sub { perl_and($num) }, c_mod => sub { c_mod($num) }, c_and => sub { c_and($num) }, }); __END__ c_and: 7 wallclock secs ( 7.09 usr + 0.05 sys = 7.14 CPU) @ 14 +00560.22/s (n=10000000) c_mod: 8 wallclock secs ( 7.44 usr + -0.01 sys = 7.43 CPU) @ 13 +45895.02/s (n=10000000) perl_and: 17 wallclock secs (16.63 usr + -0.01 sys = 16.62 CPU) @ 60 +1684.72/s (n=10000000) perl_mod: 16 wallclock secs (16.65 usr + 0.01 sys = 16.66 CPU) @ 60 +0240.10/s (n=10000000)
    For all intents and purposes, it seems like both methods are equally fast, at least as far as any Perl implementation is concerned, and at least on my architecture (i686-linux, P3/850). A few microseconds isn't going to make a difference in any real-world application, and Perl itself may have quirks that cause one implementation to be slower or faster than another. It may very well be that a 100% C solution to all of this will show the expected results (with & being marginally faster), but it doesn't look like you're going to get those results in Perl.

    Additionally, you guys are relying on Benchmark results with way too few iterations. If you're wanting to compare and contrast algorithms and are getting Benchmark results in the 0-2 second range, you need to increase your repetitions at least an order of magnitude.

(tye)Re: Testing a number for oddness
by tye (Sage) on Jan 23, 2001 at 04:46 UTC

    $x & 1 will probably have to convert the double-precision floating point value stored in $x into a "long int" before it can do the fast bit-wise "and". This conversion may well take longer than the simple floating-point division (which will have the aid of floating-point acceleration hardware) that $x % 2 must do.

    This probably explains some of the benchmark results.

    To me, a much worse problem than the misguided "optimization" of such a simple operation is that $x & 1 will only work for integral values that fit within 32 bits (on most platforms). However, $x % 2 will work correctly on integeral values of approximately 57 bits (on most platforms). Using % 2 will probably make your code handle a much huger range of values correctly.

            - tye (but my friends call me "Tye")
Re: Testing a number for oddness
by Maclir (Curate) on Jan 23, 2001 at 04:31 UTC
    OK, I was struck my the significant difference Falkkin obtained. I ran the same script on my development box here (a Sun Ultra 2, with a 300MHz Ultrasparc CPU and 128 MB of ram), and these were my results:
    odd_1: 9 wallclock secs ( 8.55 usr + 0.00 sys = 8.55 CPU) odd_2: 9 wallclock secs ( 9.46 usr + 0.00 sys = 9.46 CPU)
    The divide is about 10% faster than the shift. Why is this so? I personally would have thought a shift operation - a basic opcode in just about any instruction set - would be faster that a division.

    Update: I followed runrig's advice and took out the rand call. The results were different:

    odd_1: 7 wallclock secs ( 6.41 usr + 0.00 sys = 6.41 CPU) odd_2: 7 wallclock secs ( 6.00 usr + 0.00 sys = 6.00 CPU)
    Now the division is longer - by about 7%. Strange stuff.
Re: Testing a number for oddness
by lhoward (Vicar) on Jan 23, 2001 at 04:45 UTC
    One thing you have to remember is that these operations are, at the perl layer, not being done on an integer (or anything else that is native to the underlying architecture) but on a perl scalar value. I bet if you ran the same tests in C (or assembler, or any language that was "right on the hardware") and you were sure that modulus division wasn't part of the CPU's instruction set (or produced as a side-effect of the div operation) the results would be much more like you expected.
Re: Testing a number for oddness
by mr.nick (Chaplain) on Jan 23, 2001 at 20:05 UTC
    It's kind of funny, runrig even mentioned that when benchmarking you should exclude operations that don't concern the thing you are trying to test, yet every single example called their "perl_and/mod" function for one operation. Don't you think the function call from timethese/timeit to perl_and/mod is taking time?

    So I did it this way:

    sub perl_mod { for my $z (1..10_000) { $_[0] % 2; } } sub perl_and { for my $z (1..10_000) { $_[0] & 1; } } timethese(10_000, { perl_mod => sub { perl_mod($num) }, perl_and => sub { perl_and($num) }, });

    and got the following results:

    nicholas(/5)@neko [106 /<1>nicholas/tmp] > ./time Benchmark: timing 10000 iterations of perl_and, perl_mod... perl_and: 210 wallclock secs (209.32 usr + 0.79 sys = 210.11 CPU) perl_mod: 260 wallclock secs (258.62 usr + 0.93 sys = 259.55 CPU) nicholas(/5)@neko [107 /<1>nicholas/tmp] >

    Which shows the perl_and to be around 20% faster than perl_mod: about what I would expect.

    Then, I did a pure C test:

    #include <stdio.h> void c_and(int i) { int a; for (a=0;a<10000;a++) { i & 1; } } void c_mod(int i) { int a; for (a=0;a<10000;a++) { i % 2; } } int main(void) { int a; for (a=0;a<100000;a++) { c_and(a); } //for (a=0;a<100000;a++) { // c_mod(a); //} exit(0); }
    and ran it seperately for each of the types (I was using /bin/time to test):

    For c_and: 29.800u 0.040s 0:29.84 100.0% 0+0k 0+0io 69pf+0w
    for c_mod: 29.540u 0.390s 0:29.99 99.7% 0+0k 0+0io 69pf+0w

    Which again shows and to be faster, though no wheres as much of a difference as with perl (I presume gcc is optimized).

    test platform: dual PII-233 with a load avg of around 7 :)

      gcc is OBVIOUSLY optimized. % is always craploads slower than &. On a modern processor, a multiply, add, subtract, bitshift, etc., can be done in one cycle (assuming all operands are already loaded into registers), whereas a divide could take 10-50 or even more cycles.

      To get gcc to not optimize anything away do something like this:

      #include <stdio.h> volatile int yb; int foo; void blah(int t) { foo = (t==0)? 1: 2; } int main(int argc, char **argv) { int x = 0xdeadbeef; volatile int *y = &yb; int i; int t = atoi(argv[1]); blah(t); if (t == 0) { printf("Using the & method\n"); for (i=0;i<100000000;++i) *y=x & foo; } else { printf("Using the %% method\n"); for (i=0;i<100000000;++i) *y=x % foo; } return 0; }
      NOTES: x is initialized with a non-zero value. On x86 I have noticed sometimes ALU/Mult operations on zero-valued inputs can be significantly faster than on non-zero inputs.

      y is a volatile pointer to prevent gcc from optimizing the loops away, if it was so inclined.

      'foo' is not a constant to prevent gcc from optimizing an odd/even test.

      'foo' is initialized in a non-static function to prevent gcc from (if it could/would do this) optimizing the loops to odd/even test anyway.

      We do 100 million iterations, to drown out any noise.

      Running on an unloaded 933Mhz P-III:

      % time ./foo 0 Using the & method 0.77user 0.01system 0:00.80elapsed 96%CPU (blah blah) % time ./foo 1 Using the % method 4.29user 0.00system 0:04.43elapsed 96%CPU (blah blah)

      I ran the tests about 10 times each and the results are about the same.

      Now if you are just doing a odd-even test, gcc will likely be able to optimize it for you to the point that you can use whatever method you want and it will be fast. If you programming on a DSP, however, or maybe in a super-high-speed driver or something, you'd want to make sure you use the fast version, cuz the compiler in that case likely can't help you (or you don't want it to).

      And in Perl anyway, you are operating many levels about the underlying machine architecture. % vs & is the least of your worries speed-wise.

      See the Benchmark docs. It factors out the null loop, so even looping within your subroutine adds operations that you'd rather not time (for '&' and '%' anyway; if you are timing more expensive operations/algorithms then it is not significant). See this:
      #!/usr/local/bin/perl -w use strict; use Benchmark; my $i = 34; timethese(3000000, { AND=>\&and_them, MOD=>\&mod_them, NULL_LOOP=>\&null_loop, }); sub null_loop { } sub and_them { $i & 1; } sub mod_them { $i % 2; } ~ "tst" 23 lines, 229 characters [rover:DEV]~/tst>./tst Benchmark: timing 3000000 iterations of AND, MOD, NULL_LOOP ... AND: 1 wallclock secs ( 1.68 usr + 0.00 sys = 1.68 CPU) @ 17 +85714.29/s (n=3000000) MOD: 2 wallclock secs ( 2.33 usr + 0.00 sys = 2.33 CPU) @ 12 +87553.65/s (n=3000000) NULL_LOOP: -1 wallclock secs (-0.07 usr + 0.00 sys = -0.07 CPU) @ -4 +2857142.86/s (n=3000000) (warning: too few iterations for a reliable count)
      (I realize that in the end it really doesn't matter, I'd probably use '%' anyway, and this is all just for the sake of discusssion/fun/curiousity).

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://53618]
Approved by root
and all is quiet...

How do I use this? | Other CB clients
Other Users?
Others rifling through the Monastery: (4)
As of 2018-04-27 07:08 GMT
Find Nodes?
    Voting Booth?