Many of the talks at the Conference on Meaningfulness and Learning Spaces that I recently attended concerned (surprise!) meaningfulness. The basic idea is to provide a crude filter for bad science and bad statistics by determining whether applying a formula to data could possibly make any sense. For instance, in many computer science conference program committees, we score papers on a scale from \( -3 \) to \( +3 \) (say) and then rank the papers by averaging the scores of different reviewers (possibly weighted to counter reviewer bias). Is this aggregation operation a meaningful thing to perform on this kind of data? Is it meaningful to say "it's twice as hot today as it was yesterday"? When we use an algorithm to solve a problem, how can we tell whether the result is meaningful?

The quick answer is that, for something to be meaningful, it should depend only on inherent properties of the data and not on arbitrary scales that we might be using to assign numerical values to the data. We can formalize this in terms of a group of transformations that can be applied to a data and that preserve its meaning. The Fahrenheit and Celsius temperature scales are fairly arbitrary, for instance, so any linear transformation mapping \( x \) to \( ax+b \) (for positive values of \( a \)) is reasonable: it would give you a different temperature scale but preserve the relationships between the data values. We call data of this type "interval data". For interval temperature data, "twice as hot" is not meaningful: it depends on the arbitrary scale you're using, not just on the actual temperatures. Temperatures that are twice as hot as each other in Fahrenheit are not generally twice as hot in Celsius, and vice versa. In contrast, "ratio data" comes from a scale with an absolute zero point, as temperatures in Kelvin do; for ratio data, the only allowable transformations are scalar multiplications. If you're dealing with very cold temperatures near absolute zero, and using Kelvin, "twice as hot" does have a meaning. Some of the other data types for this sort of analysis include "ordinal data" which allows any order-preserving transformation, "nominal data" which allows any bijective transformation, and "absolute data" (such as probabilities or percentages) which allows only the identity transformation. An important basic principle is that any meaningful operation on the data must respect these transformations: if one applies a transformation to the inputs of the operation, the outputs must also be transformed in an allowable way. This naturally leads to the ubiquity of power laws for ratio data, because products of powers are the only kinds of formulae whose outputs scale by a constant factor when the input is scaled by a (possibly different) constant factor.

In graph theory, we have long had a very similar idea, that a meaningful graph property is something preserved by any isomorphism of the graph. Another example of this, of even more direct relevance to the study of algorithms, was given by Fred Roberts in his talk on meaningfulness in epidemiology. He mentioned that shortest path algorithms are used very widely, but are only applicable when the edge weights of a graph are given by ratio data: if you can perform more general linear transformations on the edge weights, such as adding the same number to each of them, you can change the result of the shortest path computation. In contrast, minimum spanning trees, despite their very similar definition, are much more broadly meaningful. They make sense even when the edge weights are ordinal, because they depend only on the pairwise ordering of the weights, something that does not change when you apply a monotonic transformation to the weights.

The study of meaningfulness is where the associative law example I posted yesterday comes from. If you have an operation that obeys a law such as commutativity or associativity either automatically by its design (e.g. it involves testing reactions to a sequence of stimuli and the grouping of these into smaller subsequences is arbitrary) or as an experimentally verified outcome, then combining this law with meaningfulness places very strong constraints on the form of the operation. For instance, the only meaningful associative combinations of scaling data have the form \( (x^{1/p}+y^{1/p})^p \) for some constant \( p\gt 0 \). Associativity forces a decomposition of a two-variable associative operation into a combination of a single-variable function with addition, and meaningfulness forces the single-variable function to be a power.

To get back to the conference program committee question: If we believe that "on a scale from \( -3 \) to \( 3 \)" is ordinal data, then averaging is definitely not meaningful. Changing the scale from \( \{-3,-2,-1,0,1,2,3\} \) to \( \{1,2,4,8,16,32,64\}, \) for instance, would be a valid order-preserving transformation. It would not change the order of any program committee member's scores for any papers. But it would possibly change the order of the averages, something that should not be allowed. On the other hand, it is not entirely clear that this replacement is a valid one: maybe it would also cause respondents to spread their scores among the scale values in a different distribution. Many cases of "on a scale of" data can be modeled as having a continuous range of values that gets divided evenly into buckets, but sometimes people will do that division in such a way that the differences between pairs of adjacent buckets are equal and sometimes they will do it in such a way that the ratios are equal. The median is always meaningful, but making a meaningful choice among the average, geometric mean, or other similar alternatives depends on having an accurate psychological model for what the data means, which may depend on subtle issues in the procedure that we use when we gather the data.

Further reading:
R. D. Luce, "Dimensionally invariant numerical laws correspond to meaningful qualitative relations", Phil. Sci. 1978.
J.-Cl. Falmagne and L. Narens, "Scales and meaningfulness of quantitative laws", Synthese 1983.
J.-Cl. Falmagne, "Meaningfulness and order-invariance: two fundamental principles for scientific laws", Found. Phys. 2004.
F. S. Roberts, "On Luce's theory of meaningfulness", Phil. Sci. 1980.
F. S. Roberts, "Applications of the theory of meaningfulness to psychology", J. Math. Psych. 1985.
F. S. Roberts, "Meaningfulness of conclusions from combinatorial optimization", Discr. Appl. Math. 1990.





Comments:

ext_132046:
2014-03-02T17:27:33Z

Of course, this road can also be traveled in the other direction - what transformation on the data should be made so that certain stuff we want to do on the data becomes meaningful. For example, FFT, generating functions, etc...

In the good old days, one can give any real number as a score for a paper - that was very useful in that it was possible to encode a ranking of the papers assigned to you, together with some quality measure. Aggregating the score never made too much sense, but that is why the PC would discuss papers, etc... No?

11011110:
2014-03-02T18:01:51Z

There's lots of research on how to aggregate combinatorial descriptions of orderings. Probably some of that would be applicable here and make more sense than averaging.

ext_2460544:
2014-03-02T19:43:46Z

See also: voting theory and the Borda count. I showed in "Ensuring every candidate wins under positional voting", Social Choice and Welfare. Vol 33, pp 311–333. (2009), available here that for almost any number of voters and almost any number \( n \) of candidates, the voters can vote in such a way that for different scoring rules, any of the \( n \) candidates can win.

11011110:
2014-03-03T05:22:21Z

It's a good point — averaging scores and computing Borda counts are almost the same thing. But a preference ballot is made a little different than a scoring by the fact that you can have only one first choice etc., which may make a difference to what is meaningful for it.

Incidentally the conference was run by Don Saari, who is quite outspoken as an advocate of the Borda count over other preference balloting methods.