The interesting thing about it is that it shows a very-bad-case for run-length encoding.
It's ironic that each item in the sequence is a sort of rle of the previous one. Granted, it is a very inept rle, since it causes the length of the "encoded" string to grow, but it is rle "in spirit". In fact, with a different starting point (e.g. 111111122222222), the rle would be *initially* OK (i.e. 7182). Of course, it would immediately start to grow again (17111812, 111731181112, etc.). It is also interesting to note that this "encoding" has one fixed point, namely 22, and it's not hard to convince oneself that it is unique. Anyway, I don't know much about rle, but I wonder if it is a property of every run-length encoding scheme that repeated applications of the scheme, starting with some string of integers, would lead to increasingly longer strings... But this is beginning to get very OT...
| [reply] |

*The interesting thing about it is that it shows a very-bad-case for run-length encoding.*
Interesting indeed. Do you have any data from which this is apparent? Or is it just a well known "feature" of the sequence - I plainly don't know!
| [reply] |

Let's examine the properties of the sequence:
### It is made up only of 1's, 2's, and 3's
We can prove this by contradiction. Because the sequence starts with '1', the only way a '4' can show up is as the first number in a pair (the "run" value -- the bold values in **3**1**2**2**1**1). After all, that's the only way that the '2' and '3' ever show up. So this means we must have a sub-sequence such as "1111", "2222", or "3333" in our sequence. Let's abstract these as "xxxx".
There are two ways "xxxx" can be placed in the sequence, at an even offset or an odd offset. At an even offset, the first and third 'x's are counts; at an odd offset, the second and third 'x's are counts. Let's examine the the even offset first. You can't have "C_{1}xC_{2}x" in the sequence, because that means it should have been encoded as "(C_{1}+C_{2})x". Similarly, at an odd offset, there must be a count before the first 'x' (we'll call it C_{1} again), which means we have "C_{1}xxxx" in our sequence. Again, you can't have two counts in a row for the same value! The subsequence would have to be "(C_{1}+x)xx".
So this means there will never be four like values in a row, thus '4' will never be in this sequence. (You can prove that '2' and '3' WILL be in the sequence.)
### Size tradeoffs are minimal
Every span of like values results in a two-character sequence (count and value). 1 value in a row results in a gain of a character, 2 values in a row results in no change, and 3 values in a row results in a loss of 1 character. What we have to examine is the sequence and show that triplets are far less common than singlets and couplets.
_____________________________________________________
Jeff `japhy` Pinyan,
P.L., P.M., P.O.D, X.S.:
Perl,
regex,
and `perl`
hacker
*How can we ever be the sold short or the cheated, we who for every service have long ago been overpaid? ~~ ***Meister Eckhart**
| [reply] |

Comment onRe: Conway's audioactive sequence oneliner