Draw an infinite tree according to the following rules:

  • Each tree node is labeled 1 or 2, and has a number of children equal to its label.
  • If the nodes are placed into their breadth-first search ordering, then two consecutive nodes have the same label if and only if they have the same parent.

There are two choices for the root label, but the trees they produce are almost the same as each other, so let's say the root is labeled 2, giving the tree shown below together with its spiraling breadth first search order. (If the root were labeled 1, the only difference would be to add one more node labeled 1 above the node that is the root in the drawing.)

Downwards-infinite binary tree with the node labels at each level forming the Kolakoski sequence, and its breadth-first traversal order

Then the breadth-first sequence of labels that we obtain is exactly the Kolakoski sequence, starting from its third term. The Kolakoski sequence begins 1,2,2,1,1,2,1,2,2,1,2,2... and has the property that it equals the sequence of run-lengths in itself: the numbers 1, 2, 2, etc. are exactly the numbers of consecutive 1's, 2's, 1's, etc. in the sequence. Our tree is defined in such a way that each run of consecutive equal labels comes from the same parent, so the run-lengths of the sequence equal the numbers of children of the nodes, which equal the labels of the nodes, which define the sequence itself.

The fact that this tree encodes the Kolakoski sequence in its breadth-first ordering is closely related to the fact that breadth first search on a tree can be encoded recursively as the following algorithm: first, output the root node. Then, for each node in a recursive breadth-first search, output its children. Applying this technique to the Kolakoski tree gives us the following Python recursive generator for the Kolakoski sequence itself:

def recurse():
    yield 2
    output = 1
    for x in recurse():
        for i in range(x):
            yield output
        output = 3 - output

def Kolakoski():
    yield 1
    yield 2
    for x in recurse():
        yield x

(The only reason we need two separate functions is to handle the first two terms in the sequence, the ones that are not part of the tree.) If this algorithm is used to generate the first \( n \) terms of the sequence, then it calls itself recursively \( O(\log n) \) levels deep, with each level of recursion tracking one level higher in the tree and generating approximately \( 2/3 \) as many values as its caller. Each level of recursion has only a constant amount of storage for its local variables. Therefore, it takes \( O(n) \) time and \( O(\log n) \) space to generate the whole sequence.

Something very similar to this algorithm, with the same analysis, is described in the paper "A space-efficient algorithm for calculating the digit distribution in the Kolakoski sequence", by Johan Nilsson in J. Integer Sequences 2012. But he replaces the chain of recursive calls by an explicit data structure, which makes the algorithm's description longer and I think more confusing. And instead of making a single tree from all but the first two numbers in the sequence, Nilsson uses a forest with infinitely many trees, with each level of the forest being labeled by its own copy of the Kolakoski sequence.

ETA: See next post for an alternative algorithm that avoids the recursion.



What a coincidence. I was just googling an efficient generator for this sequence (20 years after I last heard of it, without knowing its name but only the start 122) and found this one week old blog-post which appears to be the ultimate simplification. Thank you and congratulations!

Edit : Time goes probably like \( (n \log n) \) in both the recursive and the bit-shift solution. For each call of Kolakoski the tree has to be traversed to depth \( \log n \) and the bit-sequences also grow logarithmically with \( n \).

No, total time is only \( O(n) \) for both. In this version, just add up the total number of steps for all the recursive generators: \( n \) for the top level call, \( 2n/3 \) for the next level, \( 4n/9 \) for the next, etc., in a geometric series. For the bit-twiddling version, observe that most steps change only very few bits of the numbers, so if e.g. you're running for \( n \) so large that you're using more than one 64-bit word to store the results, most of the time the high-order words won't change.