Perl Monk, Perl Meditation  
PerlMonks 
HOP, flip, and swapby tlm (Prior) 
on Aug 07, 2005 at 20:17 UTC ( #481736=perlmeditation: print w/replies, xml )  Need Help?? 
On pp. 134135 of HOP there's a nifty algorithm I had never seen before for iterating over all permutations of a list (though I've seen some that look similar). It looked to me like the full explanation for this algorithm ended up on the editing room floor (the brief explanation given in the book is very unclear), plus I was not able to readily find one online, so I figure that a little clarification of this algorithm may be a good thing to have. Also, in the process of understanding this algorithm one comes across some cool math. To pique your interest, here's the algorithm in its original form^{1}: The function permute just returns an iterator, which is where the action is. The first time around the iterator simply returns the input list, but after that things get more interesting. (It's a nice puzzle to figure out why this algorithm works. If you want to figure it out by yourself, stop reading here.) Lexicographic order, revisitedWe're all familiar with lexicographic sorting; in fact that's what perl's sort does by default. But bear with me as I describe in a somewhat roundabout way the procedure for sorting lexicographically all the permutations ABCD, ABDC, ACBD, etc. of the letters A, B, C, and D. First we group all the permutations into 4 groups, according to their first letter, and we number these 4 groups, starting with 0, according to the alphabetic order of this first letter. Hence, group 0 is the one for the permutations beginning with A, etc. Now, for each of these groups we perform the same groupandnumber procedure, using the second letter as the basis for the grouping. In this case the number of groups is 3, since, within each of the toplevel groups, only three letters are available for the second position. Then we repeat this process one more and last time: grouping according to the third letter (only two are possible at this level) and numbering the resulting groups according to the alphabetic order in the third position. Now each permutation belongs to three nested (and numbered) groups. We can use the numbers of these groups to assign a unique "address" to each permutation. For example, the permutation CADB has address 201, because the C's are group 2 at the top level, the CA's are group 0 within the C's, and the CAD's (which contain only one permutation, CADB) is group 1 within the CA's. By this addressing scheme, the 24 permutations get addresses ranging from 000 for ABCD to 321 for DCBA. The Cantor expansionThere is a straightforward onetoone mapping between this addressing scheme and the natural numbers; it's called the Cantor expansion, or factorial base expansion. For example, 211_{F} (where the subscript denotes "factorial base") represents decimal 15: The principle of the representation is similar to that of the familiar representation of integers as powers of a fixed base, excepts that the digits in the factorialbase reprsentation correspond to coefficients of factorials. If we interpret the addresses we derived earlier (from the lexicographic sorting) as baseF numbers, then the lexicographic order for the permutations corresponds to numeric order of their addresses, which range from 0_{F} through 321_{F}, or 0 through 23 decimal^{2}. But more importantly, the addresses encode a simple procedure for mapping a natural number to a given permutation. Here it is: The first argument to fb_to_perm is a reference to the array we want to permute, and the second one points to an array whose elements are the factorialbase representation digits of an integer between 0 and m!−1, where m is the length of the list to be permuted. To construct the permutation correspondiing to 201_{F}, fb_to_perm sequentially splices out 2, 0, and 1 from the array (i.e. B, D, C ) and appends what's left, the single letter B (which is completely determined as the fourth position at this point). All we need now is some function n_to_fb^{3} for computing the factorialbase representation of a natural number. Then, to iterate over all the permutations of a list, we simply do something like this: In HOP Dominus presents a very similar strategy on pp. 130134, including an algorithm (pattern_to_permutation^{4} on p. 131) which is very similar to the one used by fb_to_perm. Ultimately, however, he rejects it because it requires too much splicing, and proposes the algorithm that motivated this meditation as a more efficient (or at least less spliceheavy) alternative. How does this alternative algorithm manage to avoid all the splicing? The trick is that, instead of building the permutation from scratch every time, it computes it as an inplace transformation of the previous one, using swaps and flips (which preserve the length of the array) instead of splice. The trickiest part is figuring out where to flip and what to swap. It turns out that the Cantorexpansion representation plays a central role in the spliceless variant too. Cutting out spliceBelow I list all the permutations of the letters A, B, C, D along with their rank (baseF), in lexicographic order.
Let's focus for a minute on the permutations whose ranks end in 00, and which are listed along the top row above. Notice that one can obtain permutation 100_{F} from permutation 000_{F} by swapping position 0 and position 1 (counting from the left). Similarly, one can obtain permutation 200_{F} from permutation 100_{F} by swapping position 0 with position 2. The same pattern holds to obtain 300_{F} from 200_{F}. The pattern can be expressed succinctly like this:
This tells us that if we can come up with an algorithm for iterating over the permutations of 3 letters in lexicographic order, we can use Rule 1 above to extend this algorithm to iterate over the permutations of 4 letters. In other words, we first use the 3letter algorithm for iterating over all the permutations of BCD, and append A at the beginning of each. Then we use Rule 1 to move from move from the first Apermutation (ABCD) to the first Bpermutation (BACD), and invoke the 3letter algorithm again. Etc. There's a problem with Rule 1, though. To get the permutation for b00_{F} we need the permutation for a00_{F}, which was visited several iterations before (6 to be exact). This is very inconvenient, because it requires keeping track of more permutations than just the previous one. But take a look at the permutations immediately preceding the *00_{F} permutations: Notice that we can always recover the permutation with index a00_{F} from the permutation immediately preceding the one with index b00_{F}, which has the representation a21_{F}, by flipping (i.e. reversing) the last three letters in it. For example, by flipping the last three letters of the permutation with rank 121_{F} (BDCA) we get BACD, which is the permutation that is required by Rule 1 to generate the permutation with rank 200_{F}, namely the one with rank 100_{F}. This is no coincidence. After all, iterating from permutation with rank a00_{F} to the one with rank a21_{F} amounts to traversing the lexicographic ordering of the last three letters of a00_{F}. Therefore, the last permutation in this subiteration has to be the one in which the last three letters are in reverse order relative to their order in a00_{F}. This simple relation between the permutations with ranks a00_{F} and a21_{F} enables us to modify our rule above so that it can operate on the immediately preceding permutation: In contrast to Rule 1, Rule 1' is completely "local". It tells us how we get certain permutations from the immediately preceding ones, and serves as the model for all other transformation rules we will need. yaddayadda(In this section we work out the counterparts of Rule 1' for the two remaining cases. If you got it already, skip to the next section, The Mother of All Rules.) The same idea works at the next level too. Consider the permutations whose indices end in 0.
For example, we obtain BCAD (110_{F}) from BACD (100_{F}) by swapping positions 1 and 2; and BDAC (120_{F}) from BCAD (110_{F}) by swapping positions 1 and 3. (Note that the requirement in Rule 2 stating that b−a must equal 1 means that b cannot be 0.) Therefore, reasoning as before, if we can come up with an algorithm for iterating over the permutations of 2 letters in lexicographic order, we can use Rule 2 above to extend this algorithm to iterate over the permutations of 3 letters (and therefore, by the argument made earlier, we then also have an algorithm for iterating over all permutations of 4 letters). As with Rule 1, Rule 2 inconveniently requires a permutation other than the previous one, but the fix is just like the fix of Rule 1, except that only the last to letters need to be reversed:
Not surprisingly a similar rule works at the lowest level too.
I included the noop "reverse the last one letter" because I intend to turn the three rules into one single rule. The Mother Of All RulesOK, we can now generalize the rule for flipping and swapping.
The most interesting task implied by this rule is determining the numbers i and d corresponding to a given natural number n. The following subroutine takes care of this: This little sub is the heart of the whole permutation enumeration algorithm. It takes an integer as input. It divides it by increasingly large divisors (in $i), starting with 2, and incrementing the divisor by 1 each time, as long as this division has 0 remainder. This amounts to counting the number of zeros in the factorial base expansion of $_[ 0 ]. (Actually, it counts 1 more than the number of zeros, which is convenient, because that's what our unified rule requires.) And the first nonzero remainder, which is captured in $d, is the last nonzero digit in this factorial base expansion. The home stretchNow we can reconstruct the original algorithm. The subroutine offsets replaces the for loop inside the original algorithm. It incorporates two very minor optimizations over the version given in HOP. One is that $i starts out with value 2, since starting with value 1 simply wastes an iteration without any effect ($p % 1 is always 0). The other small improvement is that the loop in offsets doesn't bother to check at every iteration whether $i is greater than the length of the original array. Since this events signals the end of the iteration, it is handled by the calling program. In fact, offsets doesn't even know about the original array; its only argument is an integer.
Final commentsEven though I relied heavily on the notion of lexicographic ordering in my explanation of it, the algorithm itself doesn't do much explicit ordering. It can be said that it iterates in lexicographic order according to the order defined by the input list. If one starts out with the list B, A, D, C, the first few permutations visited will be BADC, BACD, BDAC, BDCA, etc., which is the lexicographic order when B < A < D < C. I see this as a nice bonus feature of the algorithm (in HOP, the problem was stated as simply enumerating all permutations, not in any particular order). But even if one gives it as input the list A, A, A, A, it will dutifully return 24 identical "permutations", AAAA, AAAA, AAAA, etc., and it would be meaningless to call this a lexicographic ordering. Whether this is a feature or a bug is somewhat debatable. This algorithm is a special case of a large class of algorithms that Don Knuth calls the general permutation generator. Something equivalent to the offsets subroutine above is at the heart of all these generators. The various generators differ in how they order the permutations, which is implied in how the offsets i and d are mapped to a transformation of the previous permutation. For example, one can obtain a different generator, if, in perms, one replaces the definition of $j and the "flipandswap" lines with the single flip ...though, as one would expect, this generator does not produce the permutations in lexicographic order. This scheme doesn't even need $d. The scheme Knuth thinks may be most efficient is one proposed by B. R. Heap in 1963, which can be obtained by replacing the same three lines of perms as before with Now, instead of a flip and a swap we have only a test for oddness and a swap. The basis for this general framework is a representation scheme for permutation groups called Sims tables. ^{1}Well, almost; in the original, what permute returns is something that looks like this but as it turns out, Iterator, which is defined on p. 123, is just an identity function for code refs, so I replaced it with the more transparent sub keyword.▲ ^{2}In the general case, the addresses range from 0...00_{F} to q...4321_{F}, where q is 1 less than the length of the list being permuted. This suggests a nice factorial identity: for all integers m > 0. The inductive proof of this identity is a oneliner: ▲ ^{3}Here's a version of n_to_fb; it is almost identical to HOP's n_to_pat (p. 133): ▲ ^{4}In HOP, Dominus does not mention the Cantor expansion. He refers to the addresses I described above as "patterns", except that his patterns always have an extra 0 tacked on at the end (to "splice" out the last element in a 1element array).▲ the lowliest monk
Back to
Meditations

