Some cleanups I've been making to the Wikipedia article on Bell numbers (and the new related article Bell triangle) have led me to play with the mathematics of rhyme schemes.

A rhyme scheme is a description of the sets of rhyming lines of a poem by a sequence of letters; for instance a limerick has the scheme AABBA. Lines that rhyme with each other have the same letter, and the letters that represent each set of rhyming lines are chosen from a prefix of the alphabet, in order by the position of their first lines. For programming purposes it's a little more convenient to represent these things instead as vectors of numbers from \( 0 \) to \( k-1 \) where \( k \) is the number of different sets of rhyming lines and where the dimension of the vector is the number of lines in the poem. For instance, in this notation, the limerick would be [0,0,1,1,0]. An array of non-negative integers represents a rhyme scheme if and only if its first value is zero and each subsequent value is at most one more than the maximum of the previous values. The set of all rhyme schemes of a given length can be listed easily enough in lexicographic order, in constant average time per scheme by an algorithm very similar to binary counting, with an auxiliary stack data structure to keep track of the first copy of each value. In Python:

def rhymes(n):
    scheme = [0]*n                          # the rhyme scheme itself
    leaders = [0]                           # stack of 1st pos for each value
    increment = n-1                         # where to increment next
    
    yield scheme
    while leaders:
        if increment != leaders[-1]:        # can we increment this position?
            scheme[increment] += 1
            if scheme[increment] >= len(leaders):
                leaders.append(increment)   # we have a new leader
            increment = n-1
            yield scheme
        else:
            scheme[increment] = 0
            leaders.pop()
            increment -= 1

But rather than just listing these schemes in this order, a little more structure is visible by drawing them as a graph, with two rhyme schemes adjacent when their vectors are only one unit apart. By orienting this graph from smaller to larger vectors it can also be interpreted as a Hasse diagram of the majorization ordering on the vectors. It can be drawn nicely by using the coefficients as coordinates and then projecting to the plane:

graph of length-4 rhyme schemes

Sadly, this graph does not have a Hamiltonian path, ruling out certain kinds of Gray codes for rhyme schemes. (Proof sketch: Observe that it's a bipartite graph with eight vertices on the even levels and seven on the odd levels, so any path would have to start and stop on the even levels. Thus, the edge to the top vertex and the six edges through the three odd-level degree-two vertices must all be part of any Hamiltonian path. Now do a little case analysis.) But it does look sort of cube-like. What does this mean?

My first guess for the structure of a graph that looks cubical and has unique minima and maxima would be that it is a distributive lattice, but that turns out not to be true in this case. In particular, the subset of the graph between 0001 and 0112 is not distributive. The next fallback from distributivity would be that it's an antimatroid (in the original formulation of Dilworth, which applies to partial orders rather than families of sets or formal languages). And this turns out to be true!

Claim: The majorization partial order on rhyme vectors is an antimatroid.

Proof: For every monotone path from 0000... to 0123... in this lattice, and for every pair of integers \( (x,y) \) with \( 1 \le x \le y \lt n \), there is exactly one edge of the path that changes the digit in position \( y \) from \( x-1 \) to \( x \). So let's label the edges by these pairs \( (x,y) \), and let these pairs be the elements from which the antimatroid is defined. Then a sequence of distinct labels forms the valid labeling of a path (starting from 0000...) exactly when, for each pair \( (x,y) \) in the sequence with \( x \gt 1 \), the following two conditions are both met:

  • The pair \( (x-1,y) \) appears earlier in the sequence.
  • There exists \( z \lt y \) such that the pair \( (x-1,z) \) appears earlier in the sequence.

Suppose we have a path that does not already contain \( (x,y) \), and these two conditions are met. Then they will continue to be met for each extension of the path until \( (x,y) \) is finally added to the path. But this property, that an element that is available to be added to a path will continue to be available until it is added, is a defining property of an antimatroid.

Thus, each rhyme schemes in this graph can be identified with the set of pairs \( (x,y) \) occurring on a path from the origin to that scheme, and the family of sets formed in this way gives an antimatroid according to the family-of-sets way of defining antimatroids. Such an antimatroid is called supersolvable if its elements can be totally ordered in such a way that, when one set \( B \) has elements not appearing in another set \( A \), the smallest missing element can be added.

Claim: The rhyme scheme antimatroid is supersolvable under the lexicographic ordering of the pairs \( (x,y) \).

Proof: Lexicographic ordering means that we order the pairs by their \( x \) values, breaking ties by their \( y \) values. Then the smallest missing element automatically satisfies the first of the two conditions for being able to be added, because if \( (x-1,y) \) were also missing from \( A \) then it would be a smaller missing element. And for the same reason, whichever element \( (x-1,z) \) satisfies the second condition for \( B \) must also satisfy the second condition forĀ \( A \).

A basic set in an antimatroid is a set with the property that only one of its elements can be removed. When we draw the antimatroid as a graph (as above) it's just a vertex with one downward neighbor. Basic sets are also called paths, but I think that terminology is confusing because it doesn't mean a path in the graph. Partially ordering the basic sets by set inclusion produces the path poset of the antimatroid.

Claim: The basic sets of the rhyme scheme antimatroid correspond in an order-preserving way to subsets of \( \{ 1,2,\dots,n-1 \} \). In particular, the path poset of the antimatroid is order-isomorphic to an \( (n-1) \)-dimensional hypercube, the family of all subsets of an \( (n-1) \)-element set.

Proof: Given a basic set \( B \), let \( \mathrm{pos}(B) \) be the set of positions of the first copy of each value in \( X \). Conversely, given a set \( P \) of positions, let \( \mathrm{basic}(P) \) be the set of pairs \( (x,y) \) where \( y \) appears in \( P \) and \( x \) is at most the rank of \( y \) in \( P \) (where the smallest element has rank \( 1 \), the second smallest has rank \( 2 \), etc). The illustration below shows one of these sets \( \mathrm{basic}(P) \) (red dots) as a subset of the set of all pairs \( (x,y) \) (red or yellow dots).

a basic set of a rhyme scheme antimatroid

These sets are basic, because only the last element of the top row can be removed without violating one of the two conditions for one of the other elements. The "basic" mapping clearly preserves the inclusion ordering of its sets, and is invertible (its inverse is pos), so it is an order isomorphism. If \( A \) is any set of the antimatroid that is not of this form, then the last element of its upper row is removable, as is the last element of the next row down whose number of elements is too big or too small, so \( A \) cannot be basic. Therefore, basis and pos are inverse order-isomorphisms between the path poset and the hypercube as desired.

From this, we can see that the convex dimension of the rhyme scheme antimatroid (the minimum number of learning sequences needed to define it) is much larger than its dimension as a set of vectors: by Sperner's theorem, the convex dimension is

\[ \binom{n-1}{\lfloor (n-1)/2 \rfloor}. \]