Standard Deviations.

keith

5 years ago

While working on my own forthcoming book The Inside Game (due out April 21st from HarperCollins; pre-order now!), I stumbled across a chapter from Prof. Gary Smith’s book Standard Deviations: Flawed Assumptions, Tortured Data, and Other Ways to Lie with Statistics, a really wonderful book on how people, well-meaning or malicious, use and misuse stats to make their arguments. It’s a very clear and straightforward book that assumes no prior statistical background on the part of the reader, and keeps things moving with entertaining examples and good summaries of Smith’s points on the many ways you can twist numbers to say what you want them to say.

Much of Smith’s ire within the book is aimed at outright charlatans of all stripes who know full well that they’re misleading people. The very first example in Standard Deviations describes the media frenzy over Paul the Octopus, a mollusk that supposedly kept picking the winners of World Cup games in 2010. It was, to use the technical term for it, the dumbest fucking thing imaginable. Of course this eight-legged cephalopod wasn’t actually predicting anything; octopi are great escape artists, but Paul was just picking symbols he recognized, and the media who covered those ‘predictions’ were more worthy of the “fake news” tag now applied to any media the President doesn’t like. Smith uses Paul to make larger points about selection bias and survivorship bias, about how some stories become news and some don’t, how the publish-or-perish mentality at American universities virtually guarantees that some junk studies (found via p-hacking or other dubious methods) will slip through the research cracks, and so on. This is more than just an academic problem, however: One bad study that can’t survive other researchers’ attempts to replicate the results can still lead to significant media attention and even steer changes in policy.

Smith gives copious examples of this sequence of events – bad or corrupt study that leads to breathless news coverage and real-life consequences. He cites Andrew Wakefield, the disgraced former doctor whose single fraudulent paper claimed to find a link between the MMR vaccine and autism; the media ran with it, many parents declined to give their kids the MMR vaccine, and even now, twenty years and numerous debunking studies later, we have measles outbreaks and a reversal of the eradication the hemisphere had achieved in 2000. Smith chalks some of this up to the publish-or-perish mentality of American universities, also mentioning Diederik Stapel, a Dutch ex-professor who has now had 58 papers retracted due to his own scientific misconduct. But these egregious examples are just the tip of a bigger iceberg of statistical malfeasance that’s less nefarious but just as harmful: finding meaning in statistical significance, journals’ preferences for publishing affirmative studies over negative ones (the file drawer problem), “using data to discover a theory” rather than beginning with a theory and using data to test it, discarding outliers (or, worse, non-outliers), and more.

Standard Deviations bounces around a lot of areas of statistical shenanigans, covering some familiar ground (the Monty Hall problem and the Boy or Girl problem*) and less familiar as well. He goes after the misuse of graphs in popular publications, particularly the issue of Y-axis manipulation (where the Y axis starts well above 0, making small changes across the X-axis look larger), and the “Texas sharpshooter” problem where people see patterns in random clusters and argue backwards into meaning. He goes after the hot hand fallacy, which I touched on in Smart Baseball and will discuss again from a different angle in The Inside Game. He explains why the claims that people nearing death will themselves to live through birthdays or holidays don’t hold up under scrutiny. (One of my favorite anecdotes is the study of deaths before/after Passover that identified subjects because their names sounded “probably Jewish.”) Smith’s reach extends beyond academia; one chapter looks at how Long-Term Capital Management failed, including how the people leading the firm deluded themselves into thinking they had figured out a way to beat the market, and then conned supposedly smart investors into playing along.

* Smith also explains why Leonard Mlodinow’s explanation in Drunkard’s Walk, which I read right after this book, of a related question where you know one Girl’s name is Florida is incorrect, and thank goodness because for the life of me I couldn’t believe what Mlodinow wrote.

I exchanged emails with Smith in September to ask about the hot hand fallacy and a claim in 2018 by two mathematicians that they’d debunked the original Amos Tversky paper from 1986; he answered with more detail that I ended up using in a sidebar in The Inside Game. That did not directly color my writeup of Standard Deviationshere, but my decision to reach out to him in the first place stems from my regard for Smith’s book. It’s on my list now of books I recommend to folks who want to read more about innumeracy and statistical abuse, in the same vein as Dave Levitan’s Not a Scientist.

Next up: About halfway through Mary Robinette Kowal’s The Calculating Stars.