Bad graphs and bad reporting


Several blogs, including GOOD and the New York Times Economix blog have been reporting on obesity, and using this graph as evidence_

obesity rate vs time spent eating

The authors of the graph (and the blogs) claim it shows that countries whose populations eat more slowly have less obesity. I took one look at that graph and said "Bullshit".

The points are scattered all over, and if there is a correlation between the two factors, it's a very weak one. To prove this, I re-created the graph in Open Office by approximating their data points. I then did a logarithmic fit of the data, which produces a graph that is very close to the original. So far, so good_

obesity rate vs time spent eating

When you fit lines to data, there are ways to measure how well that line describes the data, by using something called the Coefficient of determination (R-squared). This value ranges from 1, which means the line fits perfectly, all the way down to zero, which means the line is basically arbitrarily drawn.

If you look at my graph, you'll see that the R^2 value of this line is 0.18. That means the fit is very weak and should pretty much be ignored. Even though the line draws your eye and makes you believe the relationship, there is very little correlation here.

Now, this doesn't mean their conclusion is wrong. There very well may be a link between speed of eating and obesity. However making that inference from this data would (and should) get you laughed out of any serious medical or scientific meeting. This is a case of bad graphing, bad reporting, and plenty of blame to go around.

Comments

Written by Paul Van Slembrouck -

Good stuff. If I recall correctly, I was taught to throw out relationships with R-Squares less than 0.75. An R-Sq as low as 0.18 might actually be evidence against the bloggers' hypothesis that time spend eating is related to obesity. If you threw some more variables in the mix, there would likely be others with higher R-Sq and the R-Sq for time could decrease. Just by eyeballing the data points, the relationship appears random--not much correlation standing out. But, that damn gray curve Catherine added to the graph sucks the casual viewers into thinking that the line reliably means something. Also, high five for using OpenOffice.Org. Down with Microsoft..

Written by Paul Van Slembrouck -

but what about other fits of the data other than logarithmic?

comments powered by Disqus