This page looks best with JavaScript enabled

Simpson's Paradox

 ·  ☕ 2 min read  ·  ✍️ Peter Hiltz

I was recently re-introduced to Simpson’s Paradox in statistics - how aggregation can mislead.

(I’m just making up the numbers here to demonstrate the concept.) Suppose you have China with an overall illness survival rate of 95% and Italy with an overall illness survival rate of 85%. On the face of it, that looks like China does a better job of taking care of patients than Italy.

But now suppose you look at the survival rate by age group and Italy has a higher illness survival rate in every single age bracket. How does that happen?

The answer is that you don’t have a randomly distributed data set. Italy’s population is skewed with a much higher percentage of the population being older. So if the illness survival rate is affected by age, Italy’s aggregate data survival rate is similarly skewed. The country with a higher percentage of older people will show a lower aggregate survival rate regardless of the same age comparison data.

The same situation applies to trend lines or anything else in statistics. You need to figure out the appropriate level of granularity so that differences in the data population are not masked by aggregation.

Another example would be comparing two different teaching methods. Suppose Class A is taught with method A and Class B is taught with method B. Could you then compare test scores and decide whether method A or method B is better? No. The next question you need to ask is whether the population of Class A is comparable to the population of Class B. If the historically better students are in the class with the higher score, you need to adjust for that factor. Contrarywise, if the historically better students are in the class with the lower score, that makes the method difference look even stronger.

Basically, beware of conclusions from aggregate statistics when you don’t know the makeup of the underlying data.

As usual, feel free to disagree using this contact link. My world view is a hypothesis, not a belief.

Share on

Peter Hiltz
WRITTEN BY
Peter Hiltz
Retired International Tax Lawyer