|Perl: the Markov chain saw|
Re^2: The (Larry) Wall Seriesby Jeffrey Kegler (Hermit)
|on Nov 09, 2007 at 12:43 UTC||Need Help??|
The recurrence equation is a(n) = 2*a(n-2) + a(n-3) + 1. It's in the literature as A052952. I take "numerology" to suggest a pseudo-relationship based on conjecture from a small sample. That is, the first few numbers may match a series, but how do I know later numbers will actually match the series? I'll present a proof below that the parse counts generated from the Minus Sign Grammar are precisely A052952, and I hope that will take us out of the realm of numerology.
First, a comment on why this is useful and/or interesting. Many monks will have heard of the Ackermann function. I don't know of any direct applications for it, but it gets a lot of use in testing because 1) it's an extreme example of a kind of computation that is practical; and 2) it focuses on a functionality that is useful.
In investigating parsing of ambiguous grammars, I wanted an extreme example of the kind of grammar likely to be encountered in practice. "In practice" and "Perl" aren't quite synonyms, but they're close. My problem with Perl grammar was that it's (of course) unambiguous. So I eliminated all notions of precedence. Also, I did away with lvalue restrictions. That is, I legalized 6++, treating it as roughly equivalent to ($tmpNN=6)++.
Honing in on the part of Perl most likely to generate lots of ambiguity, I focused on expressions with a number at each end and minus signs between them. Four operators are composed entirely of minus signs: pre-decrement, post-decrement, arithmetic negation and subtraction. The resulting grammar is
E ::= Number | E-- | --E | -E | E-E
In testing an ambigious parser, you want to be sure it gets all the parses. Counting by hand gets you only so far. Call the parse count for a string with n minus signs, W(n). At n=5, W(n)=8 and the counting is already quite difficult to do in your head:
W(10)=144, W(20)=17711 and W(36)=39088169 so even for small n, we're in territory where hand counts just ain't gonna happen. Hence the importance of identifying the W(n) series with a mathematical series. I can simply count the parses generated by my algorithm and look up the appropriate number in the series to see if they match.
Yes, you say, all very nice, but since you only hand checked up to 5, isn't this numerology? What reason is there to believe that the number of parses generated, W(n), actually matches A052952?
A proof follows. Those who think they might have any taste for this sort of thing might want to at least attempt to follow it, since it's not what the mathematicians call "technical" and is presented in Perlish terms.
The first three numbers in A052952 are defined as 1, 1, and 3. Hand counts will establish that they match W(1), W(2) and W(3).
It remains to show that W(n) = A052952(n) for n>2. (I was hoping you'd forget. :-) )
Look at the far right hand side of the sequence of minus signs. It's one of three things: a binary minus (subtraction), a unary minus (negation) or the second character of a pre-decrement. We can get a count of each of these three categories and add them to produce W(n).
The unary minus category can be formed from any of the parses with one less minus sign, by adding a negation to the right side. So there are W(n-1) parses from that source.
Similarly the binary minus category is all the parses with two fewer minus signs, with a pre-decrement added to the far right end. Therefore, W(n-2) parses from that source.
The subtraction category is only slightly harder. The only operator pattern allowed to the left of the binary subtraction is a series of post-decrements. So if n-1 is even, there is one parse from that source. If n-1 is odd, the pattern of post-decrements cannot be formed, and there are no parses from that source. In Perlish notation, the number of parses is (n-1) % 2 ? 0 : 1.
Putting all the sources of parses together, we get
W(n) = W(n-1) + W(n-2) + ((n-1) % 2 ? 0 : 1)
We can use this formula to derive a count for one of its own components, W(n-1):
W(n-1) = W(n-2) + W(n-3) + ((n-2) % 2 ? 0 : 1)
And then we can substitute in the first formula from the second:
W(n) = W(n-2) + W(n-3) + ((n-2) % 2 ? 0 : 1) + W(n-2) + ((n-1) % 2 ? 0 : 1)
W(n) = 2*W(n-2) + W(n-3) + ((n-2) % 2 ? 0 : 1) + ((n-1) % 2 ? 0 : 1)
Finally, note that of two consecutive numbers one will always be odd and the other always even. Therefore ((n-2) % 2 ? 0 : 1) + ((n-1) % 2 ? 0 : 1) will always be 1. Applying the final simplification:
W(n) = 2*W(n-2) + W(n-3) + 1
From http://www.research.att.com/~njas/sequences/A052952, the recurrence for A052952 is a(n) = 2*a(n-2) + a(n-3) + 1. Exactly the same as the recurrence I've just proved for the Wall series. QED and all that.