Correlation Versus Causation
In the world of data analysis, the phrase “correlation does not imply causation” is fairly ubiquitous as a disclaimer on the interpretation of a lot of evidence about how phenomena are related to one another. For example, when looking at a graph of SAT scores versus parental income, you would be very likely to see this “correlation does not imply causation” warning. so what are these concepts and why does this warning matter?
Correlation measures the degree to which two phenomena tend to happen together- for example, rain and carrying an umbrella are correlated. In fact, rain and carrying an umbrella are positively correlated because a higher likelihood of rain tends to be paired with a higher likelihood of carrying an umbrella, and vice versa. In contrast, snow and wearing flip flops are negatively correlated because a higher likelihood of snow tends to be paired with a lower likelihood of wearing flip flops, and vice versa.
Causation, on the other hand, indicates that one phenomenon actually causes the other phenomenon to happen. In the weather examples above, it seems at least intuitively plausible that rain would cause people to carry umbrellas and snow would cause people to not wear flip flops. So where’s the problem? Let’s examine.
Correlation Implies Three Possibilities Regarding Causation
When two events, let’s call them A and B, are correlated, where are actually three possibilities for how the events could be causally related. First, it could be the case that event A causes event B. Second, it could be the case that event B causes event A. Third, it could be the case that some outside event C causes events A and B. Therefore, in order to determine the proper causal link between events A and B, we have to rule out two of the three possibilities.
To understand this better, let’s think again about the correlation between SAT scores and parental income. From a causal standpoint, it could be the case that higher parental income actually does cause children to score higher on the SAT. Logically speaking, however, it could also be the case that the child’s higher SAT scores causes the higher parental income. It could also be the case that some yet unidentified factor causes both the higher parental income and the higher SAT scores.
In this instance, we can most likely rule out the second causal explanation, since it’s very difficult to make an argument as to how higher SAT scores of children would cause higher parental income. We still can’t make a conclusion about causation, however, since we still have two plausible explanations.
Examining Causal Possibilites
Can we rule out the possibility that some outside factor causes both higher parental income and higher SAT scores? Not really- if you think intuitively about what might be going on in this relationship, you will most likely come to the possibility that many smart parents have both good jobs and smart children! Many people don’t stop to consider this possibility, however, since the notion that money can buy higher SAT scores (presumably through tutoring as opposed to bribery) is intuitively appealing- i.e. something that we can easily picture being true. That’s why it’s important to remember that just because a scenario seems plausible doesn’t mean that that scenario has to be the one that exists.
A Less Intuitive Example
When given a less intuitively plausible scenario, people are generally better about remembering that correlation does not imply causation. For example, it is likely true that there is a positive correlation between the number of bathrooms in a parent’s house and a child’s SAT score- after all, higher income parents tend to have bigger houses and bigger houses tend to have more bathrooms! Few people would attempt to conclude, however, that the additional bathrooms actually cause the child to do better on the SAT because, in addition to not being a logical inference from the facts presented, the conclusion is intuitively absurd.
Why Does the Distinction Matter?
While it is important to understand that correlation and causation are not one and the same, the distinction between the two concepts is not always important on a practical level. For example, if all you wanted to do was predict a child’s SAT score based on his or her parents’ income, it doesn’t actually matter whether the income causes the higher SAT score. If, on the other hand, you were trying to figure out how to get your child to do better on the SAT, it becomes of utmost importance which correlated factors are causal and which are not, since otherwise you run the risk of wasting money adding another bathroom to your house for no reason.