Correlation and causality

Correlation and causality

Don’t conclude too fast!

Correlation and causality are often misunderstood terms and therefore used interchangeably. This can be a major risk regarding fake-news and the spreading of myths. For example if something is stated as correlated and understood as scientifically proven causality and therefore spread rapidly throughout the world. So it is crucial to understand and confidently distinguish betweeen both.


Here is a definition for all of you that learn best with text:

  • Correlation describes a statistical relation between parameters A and B, without taking the cause into account. Therefore the most absurd correlations evolve, some of them listed here.
  • Causality in contrast describes the causes or reason that contributes to the change of another parameter. It therefore goes a step further into detail then the correlation.

Spurious correlations

To explain these two theoretical terms more illustratively let’s look at one example from spurious correlations (Figure 1).

As you can see the expenses of the US for science, space and technology strongly correlate with suicides of all kinds. This correlation is based on data that was provided by different institutions and was afterwards statistically calculated. But that must not lead us to the conclusion that the cause for death is investing in science (or the other way around). Most of the time these correlations are simple coincidences that have a third – yet unknown – parameter in common.

Causality in science

Causality on the other hand is complex. There is no straight line between the parameters A and B with causality as a bridge. Instead there are unexplored junctions to the actual causes. Therefore even in science it is difficult to verify causality, because you need to eliminate all other factors that could influence your result. For example if you think of a study that intends to compare the influence of protein-intake between two groups; when those groups also differ in their fat-intake you can not exclusively attribute the results to the proteins.

Also reverse causality can become a problem. If you show that parameters A and B are associated somehow – how can you be sure that it’s not actually the other way around?

“…one may be tempted to say that low social status causes schizophrenia, [but] another plausible explanation is that schizophrenia causes downward social mobility…”

Gerstman, B.(2003). Epidemiology Kept Simple: An Introduction to Classic and Modern Epidemiology, Second Edition.

Interestingly, it seems logical that the two parameters from our example in Figure 1 are only correlated, but do not cause each other. But there are numerous of these correlation examples in our daily life of which we are not aware. We assume causalities although there are actually none. Sadly, correlations are often used for manipulative journalism to create scandalous headlines and create false causalities. So always be critically aware and don’t conclude too fast. Instead, always ask yourself: Is this really a causality or just a simple correlation? Can you think of a cause to this correlation? And most importantly: does the correlation imply a cause that is not proven?

Leave a Reply

Your email address will not be published. Required fields are marked *

Follow by Email