Beefy Boxes and Bandwidth Generously Provided by pair Networks Bob
P is for Practical
 
PerlMonks  

A mod2 Machine.

by code-ninja (Scribe)
on Jul 03, 2013 at 22:39 UTC ( #1042289=perlmeditation: print w/ replies, xml ) Need Help??

What is a mod2 machine?

If I put it in extremely layman's term, it is a state machine that switches from low to high (or 0 to 1 or "foo" to "bar" or... ) if the input is divisible by 2.

Now, let's say we have a set of numbers from different bases (i.e. a mix of decimal, octal and hexadecimal numbers) and we need to extract even (odd) numbers.

Most obvious (easiest) solution

use strict; use warnings; if(112358 % 2 == 0){ print "even\n"; } else { print "odd\n"; }

What's the problem with this code?

Now, the problem with this code is it accepts the number as whole. To explain more clearly, we use a bona fide integer and it will be treated mathematically, hence taking more time... hold this thought for a moment, it will be proven soon.

Enter mod2 Machine logic.

Now, a number (no matter the base) is even if its last digit is even. Therefore, any number ending in {0, 2, 4, 6 ,8, A, C, E} will be an even number (I guess I do not need to prove this point). So this time we read a string and jump state to "even" if the last character is any one of the above list. The advantage of this mod2 machine is that, there are no mathematical operations, thereby executing faster.

#!/usr/bin/perl # mod2 Machine. use strict; use warnings; "112358" =~ /['0' | '2' | '4' | '6' | '8']$/i ? print "even\n" : +print "odd\n"; # if we add 'A' | 'C' | 'E' to the list, it can handle hexadecimals as + well.

On my Linux Mint (kernel 3.x), this is what it looks like. "evenodd.pl" is the mod2 Machine whereas "evenodd1.pl" uses the mathematical logic. Clear enough, there is a slight improvement, but imagine if the number is 1500 figures long? There will be significant reduction in execution time as there will be no mathematical operations and we are just interested in last character... only one comparison is enough to judge whether or not the number is even.

We can even classify a whole string as even or odd. Just out of the wild, I dream of creating a code that can learn. It will classify each answers to questions (or reply to certain phrases) as "polite" (even) or "naughty/rude" (odd) and then produce appropriate response. It's o.k. to pass away into the fantasy world.

Conclusion

Similar logic can be designed for checking divisibility of other numbers for example a mod3 machine will check for numbers that are divisible by 3. We can use the same "look at the last character only" logic or the fact that a number is divisible by 3 if the sum of digits of that number is divisible by 3. These machines will be faster because they tend to avoid mathematical operations.

UPDATES

What I meant by bona fide integer (I actually wanted to write "number" over here) is very simple. In the first code, 112358 means (assuming base 10)
( 8 * 1 + 5 * 10 + 3 * 100 + 2 * 1000 + 1 * 10000 + 1 * 100000 )
i.e. it is a mathematical figure. Whereas in the second code, "112358" is just a string of characters. The second code can even compare a string like "A quick brown fox jumped over the lazy dog." and classify it as even or odd.

Divisibility rule for 3 can verified here

Comment on A mod2 Machine.
Select or Download Code
Re: A mod2 Machine.
by ww (Bishop) on Jul 03, 2013 at 23:14 UTC
    Quibbles
    1. Why isn't (as implied by the totality of your post) your your initial value, 112358, "a bona fide integer?"
    2. "the fact that a number is divisible by 3 if the sum of digits of that number is divisible by 3."
        Sounds good but for us non-mathematicians, please cite an authoritative source.
    3. Using Base3 numbers on a quantum computer will break your algorithm.
                  :-)
    Seriously, very nice. ++ for "well done."

    If you didn't program your executable by toggling in binary, it wasn't really programming!

Re: A mod2 Machine.
by choroba (Abbot) on Jul 04, 2013 at 00:07 UTC
    I played with the idea in base 10. Note that sometimes, the machine wins, but sometimes it does not:

    The benchmark on my machine:

    1 Rate mod machine mod 1949318/s -- -55% machine 4329004/s 122% -- 2 Rate machine mod machine 1428571/s -- -23% mod 1851852/s 30% -- 3 Rate machine mod machine 204666/s -- -89% mod 1848429/s 803% -- 4 Rate machine mod machine 974659/s -- -48% mod 1858736/s 91% -- 5 Rate mod machine mod 1855288/s -- -9% machine 2044990/s 10% -- 7 Rate machine mod machine 224517/s -- -88% mod 1845018/s 722% -- 10 Rate mod machine mod 1851852/s -- -39% machine 3058104/s 65% -- 11 Rate machine mod machine 209820/s -- -88% mod 1801802/s 759% --
    لսႽ ᥲᥒ⚪⟊Ⴙᘓᖇ Ꮅᘓᖇ⎱ Ⴙᥲ𝇋ƙᘓᖇ
Re: A mod2 Machine.
by BrowserUk (Pope) on Jul 04, 2013 at 00:10 UTC

    One comment: Your benchmark is broken:

    $MAX=1e7; cmpthese -1, { a => q[ my $c = 0; $_%2 and ++$c for 0 .. $MAX; print "a:$c" ], b => q[ my $c = 0; $_ =~ m[[02468]$] or ++$c for 0 .. $MAX; print "b:$c" ] };; a:5000000 a:5000000 b:5000000 b:5000000 s/iter b a b 3.50 -- -64% a 1.27 177% -- $MAX=1e8; cmpthese -1, { a => q[my $c = 0; $_%2 and ++$c for 0 .. $MAX; print "a:$c" ], b => q[ my $c = 0; $_=~ m[[02468]$] or ++$c for 0 .. $MAX; print " +b:$c" ] };; a:50000000 a:50000000 b:50000000 b:50000000 s/iter b a b 34.3 -- -64% a 12.3 179% --

    With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.
Re: A mod2 Machine.
by Lawliet (Curate) on Jul 04, 2013 at 00:45 UTC
    a bona fide integer will be treated mathematically, hence taking more time

    ...more time in a small enough base. Given a base that requires twice the number of symbols needed to represent every digit than the number of operations it takes to compare one symbol to every symbol, your statement doesn't hold (italics for readability).

    Just poking fun. I thought it was a good writeup and an interesting observation ;).

Re: A mod2 Machine.
by hdb (Parson) on Jul 04, 2013 at 05:49 UTC

    Thanks for this very thought-provoking write up. It is questioning many of my beliefs and it is good to go back and check on them. For example, The advantage of this mod2 machine is that, there are no mathematical operations, thereby executing faster. is just the opposite of what I would have thought. My background is more C/C++ so before checking the last digit, one has to turn a simple number into something much more complex, a string, and then do some operations on it. I would also have thought that most mathematical operations are in hardware nowadays and should take almost no time. But these assumptions might not be valid in Perl where translating between numbers and strings is build in already.

    There is a good article on Wikipedia on divisibility rules: http://en.wikipedia.org/wiki/Divisibility_rule (for the non-mathematicians...).

    As for the generalization of your idea, I am a bit sceptical. "Check the last digit" does only work for mod2 and mod5 and not even in all bases. For example, 11 base 3 is an even number (4). 14 base 16 is divisible by 5 (20). Other rules are more complex and also depend on the base. The divisibility rules for 3 and 9 based on the sum of the digits rely on 10**n % 3 = 1 and 10**n % 9 = 1. In base 3 a number is divisible by 3 if the last digit is 0.

    Another thing I am wondering is whether the speed-up does rely on providing an outright string into the regex? So how does

    "112358" =~ /['0' | '2' | '4' | '6' | '8']$/i ? print "even\n" : prin +t "odd\n";

    compare to

    my $n = 112358; $n =~ /['0' | '2' | '4' | '6' | '8']$/i ? print "even\n" : print "odd +\n";

    It would be great if someone familiar with Perl's inner workings could comment on this.

      Even in C, we can do this without any conversions (i.e. convert a number to string) we can directly read a string and do this

      #include <stdio.h> #include <string.h> int main(int argc, char **argv) { char *str = malloc(strlen(argv[1]); strcpy(argv[1], str); char last = str[strlen(str) - 1]; switch (last){ case '0': case '2': case '4': case '6': case '8': case 'A': case 'a': case 'C': case 'c': case 'E': case 'e': printf("even\n"); break; default: printf("odd\n"); break; }

      A caveat, in search of a better word, would be the string allocation and copying routine. I used the standard string library functions but it will still take time even though the actual logic (the switch statement) is really fast.

      As for the generalization of the idea, yes, this logic is applicable only for mod2 and mod5 machines where only 2 states are sufficient. And yeah, this is applicable for all bases that are of the form 2^n and base 10 (example, convert 112358 base 10 to base 32 and 11235813 base 10 to base 32).

      For bases >16, we need to go beyond `F' for example, in base 32, 15 base 10 will be represented as F and 16 as G. Therefore, in base 32, if the number ends with a G, it is even.

        Correct, if you have the number as a string already, you do not need to convert the number to a string.

        You do not need to copy the string though, you could get the last character directly from argv[1] in your example.

      I was wondering about what you said: whether the speed-up does rely on providing an outright string into the regex

      my $n = 112358; ...

      I guess, it does rely on it. Not much but look at it from the processor's perspective. It is an assignment operation hence there will be an extra "mov" instruction moving data from the accumulator to say the DX register, right? Whereas providing an outright string will take data directly from the accumulator.

      Note

      My explanation is open for abuse... :D

Re: A mod2 Machine.
by choroba (Abbot) on Jul 04, 2013 at 07:09 UTC
    /['0' | '2' | '4' | '6' | '8']$/i
    A string ending in |, ' or a space is not even. Digits do not have upper and lower case variants.
    /[02468]$/
    لսႽ ᥲᥒ⚪⟊Ⴙᘓᖇ Ꮅᘓᖇ⎱ Ⴙᥲ𝇋ƙᘓᖇ

      I wanted to say

      /[02468ace]$/i;

      /i is now significant.

Re: A mod2 Machine.
by RichardK (Curate) on Jul 04, 2013 at 12:01 UTC

    I have to disagree with you conclusions, you're just not comparing like with like.

    Which technique is faster depends entirely on the data format. If the data is in binary then you will only have to test the least significant bit, and one bit test will be faster than comparing the last digit against a list of alternatives.

    OTOH converting text to binary then performing a mathematical operation may be slower than just operating on the text directly. But you will have to identify the last digit in the text and so will probable be running a regex, which are complex and therefore take time.

    So exactly which approach is faster will depend on many factors and will take comprehensive testing to determine which is best.

    Modern processors are highly complex and highly optimised and just assuming that mathematical operations are slow is a mistake. You really have to carefully test your code to determine which approach is better in any particular situation. And even they you've only optimised it for one platform if you want your code to portable then that's even more difficult.

    In general it's much better to design a simple algorithm that's easy for a human to understand and let the interpreter/compiler & processor work out the best way to run it. Compilers and processors have lots of very clever optimization techniques and they are usually better at turning source code into running code that we are, so let them do their job.

    The conclusions you should draw are

    1. It's better to choose an algorithm that matches the data format.
    2. choosing the wrong algorithm could be slow.
    3. Don't second guess the processor/compiler -- test it properly.
    4. designing good tests is hard.
      If the data is in binary, the data will be even if its LSB is 0. The mod2 logic does exactly that. For data in the binary format, it is checking the LSB instead of converting the number to decimal or doing other exotic things. You're right about one bit test but then I'm using the mod2 logic to handle all kinds of numbers (where all kinds == {decimal, octal, hexadecimal, binary}).

      my $n = 534587; print "even\n" if($n % 2 == 0);

      This will print nothing on the screen (obviously, it indicates that the number is odd) and no matter how fast the processor is, the condition inside "if" will actually be evaluated i.e. 534587 will actually be divided by 2 and the remainder will be compared with 0. Do any compiler/processor optimization, this fact will not change (afaik, that is to say).

      OTOH,

      my $n = "534587"; print "even\n" if ($n =~ /[02468]$/);

      there are no mathematical evaluations here, no matter how big the string is, the regex here just checks last character. There will be no need of actually calculating the remainder and comparing.

      I'm not saying that processor will take time to evaluate a mathematical expression, it will be derogatory to think that about modern processors. I'm just trying to stress on the fact mod2 machine circumvents any mathematical operation.

      This algorithm does fail if the bases are not of the form of 2^n or base 10. Also, as ww said, it will fail for base 3 numbers on a quantum computer

Re: A mod2 Machine.
by Laurent_R (Priest) on Jul 07, 2013 at 22:04 UTC

    Interesting discussion, but it is not necessary that clear that arithmetic operations should be slower.

    I had a discussion on a French Perl forum a few months ago about the modelisation of a game. Without going too much in details, the game is played on a 4x4 grid with stones which have a white and a black side. At the start, each of the 16 positions has a stone, either white or black. Each time the player plays, she or he can flip some of the stones in accordance with some rules (you may flip all the stones of a row, all the stones of a column, all the stones of the two diagonals, plus some other patterns, in total 46 authorized moves). The aim is to get to all stones being white. It can be demonstrated that with any start position, given the set of authorized moves, it is always possible to win in 5 moves or less.

    The discussion on the forum soon converged on the idea that the game can be represented in binary. For example, a start position where all the stones are black can be represented as 1111 1111 1111 1111 and the winning position where they are all white would be 0000 0000 0000 0000. Similarly a move could be represented as a binary: flipping the first row might be the binary 1111 0000 0000 0000. Executing a move is then a simple exclusive or (xor) between the original position and the move. So that applying the move just given on the initial position would lead to this new position: 0000 1111 1111 1111.

    The issue I am coming at is that, in this discussion, two or three persons were thinking of binary strings, i.e. strings made of 0 and 1. My idea was that it was much better to use actual binary numbers. Any position can be represented as a number between 0 (the winning position) and 2^16 - 1, i.e. 65535 in decimal notation. Similarly, the moves can be represented by some 46 numbers between 0 and 65535. In this case, applying a move consists simply in using the ^ operator between the position and the move, which ought to be faster than doing the equivalent thing on a binary string (there were other reasons for chosing real binary number, including memory usage, etc.).

    Anyway, to try to prove my point that using real binary numbers and the ^ operator was better, I needed to compare it with various means of doing the same thing with binary strings. I benchmarked about a dozen different ways to apply a move to a position with binary strings (using regex, arithmetics, etc.). I may not have found the best way to do it, but the best I found, by far, was to add the position string to the move string, and then use tr/// to changes the 2's into 0's. For example, position 1111111111111111 and move 1111000000000000 can be added to give 2222111111111111, yielding 0000111111111111 after the 2 to 0 substitution.

    My point is just that in such a case, the best solution I found for making an exclusive or on binary strings was an arithmetic addition followed by a character substitution. Any solution using regexes or trying to break the binary string into individual elements was far far slower. Now, of course, my experiments might just prove that I was not clever enough on how to do this exclusive or on two binary strings efficiently.

    Just in the event that you are interested, the true binary xor was between 4 and 5 times faster than the best binary string xor I could come up with. But that is not my point today.

    I just wanted to say that there are cases where arithmetic operations seem to turn out to be faster than character or string operations. Granted, we are quite far from the OP situation, but I thought this experience might bring some food for thought.

      Ok, I'll be more clear. First, when I said that the arithmetic operation will take more time, I said it with respect to the mod2 machine. Second, let's take your example (just a wild guess, is this game GO)

      #!/usr/bin/perl # note that %b modifier processes binary. Similarly, %x processes hexa +decimal and %o is for octal. use strict; use warnings; my ($move, $pos); $move = 0b1111111111111111; $pos = 0b0000111111111111; printf "%b\n", ($move ^ $pos);

      Even here, there are no arithmetical operations. Note that Exclusive Or is a logical operation. This is somewhat similar to a mod machine. You change state only if there are opposite inputs. The catch here is, if we'd have used strings instead of a number, there would've been a few more lines of code, keeping in mind and as you said, breaking the string and using regexes. I'm not saying that we should replace every arithmetic operation with string/logical operation, we just can't, I'm just trying to convey that there are better ways to do simple things that school teaches us in a very dogmatic way.

      I mean seriously, why should I actually divide a number by 2 and check the remainder when I can directly check the last digit?! Or why should I recursively divide and mod by 10 to generate the reverse of a number when I can do it in one line?!

      /* Apologies for using C, but its my "mother-tongue" :P */ #include <stdio.h> int main(int argc, char **argv) { int ; for(i = (strlen(argv[1]) - 1); i >= 0; i--){ printf("%c", *(argv[1] + i)); /* reverse a string, a number, a + sentence, a novel... */ } }
      Note/PS: Logical operations, too, are faster than arithmetical operations afaik.

        I think we agree.

        My point is that I was using the ^ logical operation on actual binaries, something very similar to what you showed:

        my ($move, $pos); $move = 0b1111111111111111; $pos = 0b0000111111111111; printf "%b\n", ($move ^ $pos);

        But I needed to show that using actual binaries, such as 0b0000111111111111 was far superior to using binary strings, i.e. something like "0000111111111111". To do this, I needed to find the best possible way to do the equivalent of ^ for strings. And it turned out that the fastest way I found was to add the strings (meaning an implicit conversion of the string to a digit), giving a result like 1111222222222222, and then (with another implicit conversion) replacing the twos by zeros (with a command like tr/2/0/) to finally get "1111000000000000". And that was four times slower than the logical ^ on actual binaries. But it was still the fastest way to do it on strings. So that, in that case, arithmetics was faster than, for example, regexes or splitting the strings into individual characters to process them one by one..

        I mean seriously, why should I actually divide a number by 2 and check the remainder when I can directly check the last digit?!

        I definitely agree with you on that. I was only saying that there are some other cases where arithmetics is faster than other means. Although, in my case, the best, by far, was to use actual binary nombers and a logical exclusive or (4 fimes faster than artithmetics on binary strings).

Re: A mod2 Machine.
by zork42 (Monk) on Jul 10, 2013 at 10:37 UTC
    I wonder how the speed of
    my $n = 534587; print "even\n" if($n % 2 == 0); # modulo operator of possibly a flo +at or possibly an integer (*)
    compares with
    use integer; my $n = 534587; print "even\n" if($n & 1 == 0); # checking LSB with Bitwise And on +an integer
    (*) perlop: Integer-Arithmetic says
    "By default, Perl assumes that it must do most of its arithmetic in floating point."

    =====

    Also I wonder how this
    my $n = "534587"; print "even\n" if($n =~ /[02468]$/); # reg exp
    compares with:
    my $n = "534587"; my $c = substr($n, -1, 1); # last char print "even\n" if ( ($c == '0') || ($c == '2') || ($c == '4') || ($c = += '6') || ($c == '8') ); # no regexp
      Use Benchmark to find out.

      Also, using floats to compute modulo does not seem reasonable to me.

      لսႽ ᥲᥒ⚪⟊Ⴙᘓᖇ Ꮅᘓᖇ⎱ Ⴙᥲ𝇋ƙᘓᖇ
        I was just about to say that! zork42 please do benchmark it and post the results. PS: I'm really very lazy to do that and plus I'm busy perusing the perldoc's PerlRE tutorial. :)
Re: A mod2 Machine.
by nonsequitur (Beadle) on Jul 12, 2013 at 10:15 UTC

    This may be OT since it doesn't really have anything to do with Perl, but one of your original assumptions is wrong.

    In an even base a integer ending in an even digit must be even because the previous digits are multiplied by a power of an even base . Leaving the last digit to decide the parity.

    In an odd base the total of the digits must be even for the integer to be even because all of the digits are multiplied by a power of an odd base so that the that the parity of the integer depends on parity of all the digits.

      hmm... hmm... umm... I said this algorithm will work for all bases of the form 2^n and base 10. But according to you, this algorithm will work if the base is of the form 2*n. I'll just clarify right after commenting. Hold on...

      UPDATE

      apologies, my assumption was wrong. This algorithm will work, i.e. produce correct output, if the base is of the form 2*n.

Re: A mod2 Machine.
by zork42 (Monk) on Jul 13, 2013 at 09:22 UTC
    using floats to compute modulo does not seem reasonable to me.
    I only included this:
    my $n = 534587; print "even\n" if($n % 2 == 0); # modulo operator of possibly a flo +at or possibly an integer (*)
    because
    1. it is equivalent to the OP's "Most obvious (easiest) solution" (infact code-ninja used this exact example here: Re^2: A mod2 Machine.)
    2. it might actually be doing a (possibly comparatively slow) float operation because use integer was not used.
      It probably had to go: float 534587 --> integer 534587 --> integer modulo operation on integer 534587

    Whereas doing a proper integer bit test should be faster:
    use integer; my $n = 534587; print "even\n" if($n & 1 == 0); # checking LSB with Bitwise And on +an integer
      should be faster

      Why art thou speculating? use Benchmark;

      I benchmarked the bitwise operation with my method and even then my method wins... for 1e6 iterations.

      Rate machine bitwise machine 819199/s -- -43% bitwise 1448690/s 77% --

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlmeditation [id://1042289]
Approved by ww
Front-paged by ww
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others chilling in the Monastery: (14)
As of 2014-04-17 14:10 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    April first is:







    Results (449 votes), past polls