The Signal and Noise of Measurement and Metrics
More can be more or less, and it can even be nonsense.
Imagine you wanted to compare some things. Inherently this is usually a desire to rank or order them, but it can also be a desire to describe them orthogonal to ranking.
A question like “how busy is this airport?” is inherently a question within the realm of “what is the busiest airport?” implying a ranking of airports by busyness dependent on a definition of what busy means. A question like “how beautiful is this airport?” might be a question within the realm of “what is the most beautiful airport?” implying a ranking and definitions of beautiful, but not necessarily. Or at least the connection can be so weak that thinking about it that way is a distraction. I might just be interested in thinking about, say, the beauty of the Singapore Changi Airport without comparing it to other airports but rather just as a part of a collection of beautifully designed things.
The question might be concerning if that airport is beautiful enough to be in the group of beautifully designed things I was putting together. Perhaps it also hinges upon if airports can be thought of as beautiful things. Many would refuse to consider a sewer system among beautiful things. Mike Rowe would likely argue vehemently (and correctly in my view) that they most certainly can be.
Ranking Minds versus Definitional Minds
So in a simplified view we have ranking minds and definitional minds.
I think this difference in modes of thinking explains communication gaps between people. Since there is always an inherent or eventual ranking behind any quantitative1 measurement, some minds or modes of thinking will tend to always reduce this kind of question into a comparative. For them it is all relative.
Others will tend to always rest at a higher/first level not interested in going deeper/further.2 They are much more content with absolutes in these cases. Buildings are tall. They are tall structures by definition. THAT building is really tall while THAT building is short . . . for a building.
These thinkers or people in this mode of thinking don’t like it when pressed to explain much less justify their observations. Take it from me. I know because I am generally of the former mindset and have gotten myself into trouble stumbling around with people in the latter.
I accept I-know-it-when-I-see-it reasoning for what its worth. I just think it has sharp limitations. And I’m want to understand how it is you know it so I can either know it too or at least know when you are seeing it.
Once I hear a description of something that has interesting if not arguable properties, I turn to the implications of that description as a metric to outcomes. Again, others often aren’t interested in this. The remainder of this post is.
The Two Types of Statistics
Descriptive statistics are what most people think of when they hear the term “statistics”. They are simply a summary, organization, or description of data. Inferential statistics are what people like me tend to jump to when we hear statistics. This is the use of statistics to make predictions and draw conclusions from data.
To understand statistics at the inferential level, one understands that proportions are not probability. They are ratios describing something in a way that can be expressed in percentages. For example, the share of people who are farmers might be stated as about 2%. This does not mean in any very meaningful sense that a child has about a 2% chance of growing up to be a farmer. To draw that inference and have it amount to anything actually being said, we would need to describe and define things further. Similarly, knowing that “when leading at halftime Alabama wins [80%] of the time” does not tell you much of anything despite the implication from the sports commentator proclaiming this as the teams head to the halftime locker room Alabama leading 24-14. More on this foolishness later.
Likewise, this understanding at the inferential level would include that probabilities are not results. IF the probability of growing up to be a farmer is 2%, this does not mean a child would be a farmer 2% of their employed time however defined. Similarly, an 80% chance of winning a football game does not mean that team will win 80% of the football game. And finally a winning football team did not have in any meaningful sense a 100% chance of winning the football game—binary outcomes DO NOT IMPLY binary probabilities.3
A Taxonomy of Metrics
With all of that out of the way, here is what I really want to explore. I want to delineate among differing descriptive statistics by type all against the backdrop of how they can be helpful in inference. In other words, let’s talk about meaningful things in a way that helps us draw meaningful conclusions. Let’s focus on signals and reduce noise.
By doing this we transform a bare statistic into a metric—the term I will use from here out in this post.
I propose that there are the following types of metrics:
Primary
Secondary (derivative)
Irrelevant
Where in this list a metric falls depends on the question(s) being asked and the data available.
Presuming we have good, true data, for each of those types there are at least two important qualities:
Refinement
Independence
Suppose we wanted to sort a group of people by physical size (biggest to smallest). I am deliberately choosing a rather vague goal both because most of the time people’s quests for understanding suffer from vagueness and because it will help tease out the differences between these metric types.
So here are some examples for each type given this quest to rank people by size:
Primary - Height in feet rounded to the foot or weight in stones rounded to the stone
Secondary - Length of legs or age
Irrelevant - Length of a person’s name or amount of a person’s income
Notice that the primary metrics are substantially independent of one another. Despite correlations, they are measuring different things.
We can enrich the primary metrics by measuring by inches and kilograms rather than feet and stones, respectively. This would increase refinement.
The secondary metrics are correlated with the primary metrics and, thus, are potentially helpful to get to our desired result—ranking by size. If these were all we had, we could build a pretty good ordering. The problem with these is not that their correlations are relatively weak. It is that in the midst of having the primary metrics available these become superfluous. They aren’t adding information.
In fact they are likely distorting information since with height and length of legs we will get an over indexing for people with disproportionately long legs, which will generally be people who are tall. This would distort any averages or other derived statistics we might want to calculate for inferential use. At a more basic level the presence of a freakishly short-legged or long-legged person might reorder our list in a way that doesn’t make sense.4
The same could be said of age. So because we have the primary metrics, the secondary ones have gone from being roughly useful to just unhelpful noise.
The irrelevant metrics are even worse. Any correlation between size and these would be very likely spurious. Sure there might be some cultural connection between name length and size, but it would likely be so minor distinguishing it from chance would be fruitless. There is a connection between income and diet and diet and size, but there are strong countervailing forces that kick in to reverse that trend—poor people in high-income places tend to be bigger by weight at least.
As we introduce other primary metrics, it is vital that we consider their independence. Information on each person’s volume would render height directly secondary and weight indirectly so—assuming we were defining size as amount of space occupied.
What if we were to define size as gravity—gravitational pull? Now weight is essential if available with height and volume a distraction in its presence.
Distractions From All Directions
This brings us back to the halftime football score example—recall that Alabama is up 24-14. What we presumably want to know is the likelihood that Alabama wins the football game. In the metric sense we have a measure of Alabama’s ultimate success when leading at halftime. Given that they lead at half, can we make use of this metric? What is it telling us?
Knowing that Alabama wins 80% of the time when they lead at half might not really tell us much of anything at all. It is a secondary metric masquerading as a primary one.
Perhaps this can be made clear by considering what we don’t know from the information given. Namely,
Percent of the time Alabama wins in all cases regardless of the halftime score
Number of games in which Alabama is ahead at half (sample size)
Average lead at half for Alabama in those ahead-at-half games
Correlation between margin of lead and winning outcome
The percent of the time the typical team wins when leading at half
The percent of the time this particular opponent wins when trailing at half
Variances and distributions for the above and much more
Another way of seeing it is to consider how other information, which is very likely already known by the actual TV viewer in our hypothetical, renders this particular stat obsolete. That would be the composition of this particular game itself. In the extreme if Alabama’s quarterback broke his collarbone on the last play of the first half, we probably can throw out the halftime lead information without further thought.
It doesn’t have to be that stark, though, to safely ignore the 80% number. If 21 of Alabama’s points came off of three busted, lucky plays, we would know the halftime air is sweet perfume.
There is a deeper problem. The additional information seems to make this a conditional: What is the probability Alabama wins given that they are leading at half time? But that condition is a distraction. We don’t care about the esoteric question the conditional explicitly poses. We care about ‘Bama’s chances.
Contemplation on the conditional question yields us more of the how-many-angels-can-dance-on-the-head-of-a-pin? type of knowledge than it does any insight into the eventual outcome of this particular game.
In a desperate search for clairvoyance driven by emotion rather than truth seeking, we ascribe to this metric informational value when it really has nothing new to offer. We already knew that teams ahead at half tend to win. We already knew Alabama was in the position they are in based on how the game has gone and how it was expected to go. If we’re watching the game, we probably know a great deal more and well beyond anything this relatively precise stat line is giving us.
Therefore, this metric isn’t secondary. It is most likely irrelevant.
Conclusion
From this we can draw broader lessons:
We desire certainty in the face of uncertainty. That doesn’t make it available, but it does make us susceptible to seeing it when it isn’t there.
We naturally seek insight that is comforting over that which is true.5 For the Alabama fan, this hollow stat makes them feel safe. For the fan of Alabama’s opponent it might at least allow them to say, “So you’re telling me there’s a chance.”
We mistakenly think more data is better data. More information can be beneficial, but it can also be detrimental. It can supplant rather than enrich. And it can outright distract.
Metrics can be useful when understood and used appropriately. They can be distractions when not.
PS: And remember that all models (metrics) are wrong. Some models (metrics) are useful.
PPS: And of course don’t forget Goodhart’s Law.
This is true of qualitative as well since it is just a looser, vaguer, and less determinant form of a quantitative measure. Qualities are quantities when we measure or define them.
I am not saying this is a lesser form of thinking, per se. On the surface it is just different.
I am continually amazed at how often this mistake is made.
Consider three people (A, B, and C) who are 6’, 5’ 8”, and 5’ 5” but whose leg length is 30.24”, 32”, and 30.5”. By total height they order ABC, by leg length alone they order CBA, and by the average of the two they order BAC. Without some other reason to value and thus include leg length, ABC is the only legitimate ordering.
Recall the joke/parable about the group lost in the woods in, say, Canada when one of them pulls out a map and begins studying it closely and asking the others to follow his directions. Upon inspection, one of the other members of the group is perplexed to the point since it is a map of New Jersey. He asks, “What good will that do us?!?” In reply, “At least its a plan.”