statistics
The Leslie Problem
· ☕ 582  words statistics  · ✍️ Peter Hiltz
In my last post I mentioned collecting random names by nationality as test data for a project. Doing my normal overthinking I started wondering about how authors could choose appropriate random names for characters and ran into discussion about “The Leslie Problem”. Suppose you are a gender equality researcher and your data has actual names, but no gender assigned to them. You can generally assume that “Mark” is probably male (I’m ignoring all the LBGTQ+ issues) and “Susan” is probably female.

Simpson's Paradox
· ☕ 339  words statistics  · ✍️ Peter Hiltz
I was recently re-introduced to Simpson’s Paradox in statistics - how aggregation can mislead. (I’m just making up the numbers here to demonstrate the concept.) Suppose you have China with an overall illness survival rate of 95% and Italy with an overall illness survival rate of 85%. On the face of it, that looks like China does a better job of taking care of patients than Italy. But now suppose you look at the survival rate by age group and Italy has a higher illness survival rate in every single age bracket.