# Concepts, bicliques, and sparsity

An old paper of mine on finding complete bipartite subgraphs in sparse graphs can be reinterpreted as doing a form of data mining called formal concept analysis (see e.g. Choi for the connections between these two formalisms). A system of objects and attributes (say, photos in Flickr and their tags) can be represented as a bipartite graph, and "concepts" are bicliques (maximal complete bipartite subgraphs). The concepts form a lattice in which the join and meet operations both involve intersecting the vertex sets of bicliques on one side of the bipartition and then extending that intersection to a biclique on the other side.

Anyway, the result of my paper, rephrased in this language, was that if the object-attribute graph is sparse, then the total size of all the concepts in the concept lattice is linear in the number of objects and attributes, and the lattice can be generated in linear time. Or at least, the set of bicliques can be generated in that time; I didn't address the connections between bicliques in the concept lattice structure.

Which all makes it a little odd to see a paper by Lindig claiming that in systems with sparse object-attribute graphs, the size of the concept lattice empirically seems to grow quadratically. I think the resolution of this conflict is that the definitions of "sparse" are different: in my paper, a system is sparse if there's some absolute bound k such that any subsystem of N objects and attributes has at most kN relations. Equivalently, the system can be constructed from a (Barabasi-like) growth process in which one adds objects and attributes one at a time, each new object or attribute connected to at most k earlier attributes or objects. In Lindig's, the systems are generated randomly with a small but fixed fill-in factor, so one can view them as a form of Erdős–Rényi random graph...I'm wondering whether his quadratic growth rate is less about sparsity and more about randomness.

Anyway, the result of my paper, rephrased in this language, was that if the object-attribute graph is sparse, then the total size of all the concepts in the concept lattice is linear in the number of objects and attributes, and the lattice can be generated in linear time. Or at least, the set of bicliques can be generated in that time; I didn't address the connections between bicliques in the concept lattice structure.

Which all makes it a little odd to see a paper by Lindig claiming that in systems with sparse object-attribute graphs, the size of the concept lattice empirically seems to grow quadratically. I think the resolution of this conflict is that the definitions of "sparse" are different: in my paper, a system is sparse if there's some absolute bound k such that any subsystem of N objects and attributes has at most kN relations. Equivalently, the system can be constructed from a (Barabasi-like) growth process in which one adds objects and attributes one at a time, each new object or attribute connected to at most k earlier attributes or objects. In Lindig's, the systems are generated randomly with a small but fixed fill-in factor, so one can view them as a form of Erdős–Rényi random graph...I'm wondering whether his quadratic growth rate is less about sparsity and more about randomness.