A hypergraph is just another way of talking about a family of sets: one thinks of the elements of the sets as vertices and the sets themselves as being like edges of a graph. Except that, unlike edges, sets in a family of sets can have more than two vertices, so we call them hyperedges rather than edges. An \( r \)-uniform hypergraph is one in which all the hyperedges have the same number of vertices as each other; for instance, a 2-uniform hypergraph is just an ordinary graph.

If the vertices of a hypergraph are given two colors (black and white), the discrepancy of the coloring measures how evenly each hyperedge is colored. In an ordinary graph, a proper 2-coloring has equal numbers of each color in each edge, and we want to get as close to that as we can in a hypergraph as well. So the discrepancy of a hyperedge is defined to be the absolute value of the difference between the numbers of black and white vertices, the discrepancy of a coloring is defined as the maximum of the discrepancies of any of the edges, and the discrepancy of a hypergraph is defined as the minimum discrepancy of any of its 2-colorings. For instance, the 4-uniform hypergraph shown as a Venn diagram below has discrepancy 2: it turns out not to be possible to 2-color the vertices so that all four hyperedges are evenly balanced between black and white. For if it were, the three lower sets (red, blue, and yellow, all sharing the bottom three vertices but each with a different top vertex) would force the three top vertices to have the same color as each other, causing the upper green set to be unbalanced.

A 4-uniform hypergraph with discrepancy 2

How many hyperedges must an \( r \)-uniform hypergraph have in order to force it to be unbalanced? This question generalizes the question of how many edges are needed to make a graph non-bipartite, for which the answer is three: a triangle is non-bipartite, and anything with fewer edges isn't. For \( r=4 \), the example above (and some messy case analysis for fewer sets) shows that the answer is four. But for \( r=6 \), it's three again: a 6-uniform hypergraph formed by tripling each vertex of a triangle (without changing the edge intersection pattern) has discrepancy two, because two of the sets of three equivalent vertices must have the same majority color.

A 6-uniform hypergraph with discrepancy 2

Generalizations of these examples show that three sets can force a nonzero discrepancy whenever \( r \) is 2 mod 4, and four can force a nonzero discrepancy whenever \( r \) is 4 or 8 mod 12, so the remaining cases are the ones where \( r \) is divisible by 12. On the other hand, the best upper bound I know how to prove that is valid for arbitrary \( r \) is logarithmic in \( r \). I don't even know whether the number of sets needed to force a nonzero discrepancy is bounded for all \( r \), or whether it can grow without bound as a function of \( r \).

One can ask the same sort of question when \( r \) is an odd number, of course. To prevent the problem from becoming trivial, we can define a hypergraph to be balanced when its discrepancy is at most one, and ask for the minimum number of hyperedges in an unbalanced \( r \)-uniform hypergraph. For this variation of the problem, the number of hyperedges needed to force the hypergraph to be unbalanced has an even weaker upper bound, of the form \( r+O(\log r) \).

This problem comes up in my latest preprint with Dan Hirschberg, "From Discrepancy to Majority", arXiv:1512.06488, to appear at LATIN 2016. We use the odd-\( r \) case as a preliminary step in an algorithm that takes as input a two-colored set of elements, and is allowed to access the coloring only by means of an oracle that returns the discrepancy of an \( r \)-subset of the elements. This problem was introduced in a paper by De Marco and Kranakis, motivated by fault diagnosis of distributed systems, and we improve their bounds on the number of queries needed by roughly a factor of \( r/2 \). In order to get any useful information at all from the oracle, we need to make it give an answer different from "this set has the minimum possible discrepancy", and to do that we need to set up a system of sets where at least one of them does not have the minimum possible discrepancy.

But I think the problem of finding minimal unbalanced uniform hypergraphs is interesting on its own, even aside from this application. The bounds we give on it in our paper seem to be far from tight, in part because they give lower-order contributions to the overall query bound for our algorithm so we didn't have a lot of motivation for tightening them. So there should be scope for plenty more exploration of this problem.