Whew! I’m glad that’s over. For Insiders, my recaps of the drafts for all 15 NL teams and all 15 AL teams are up, as well as my round one reactions and a post-draft Klawchat.
Charles Seife’s Proofiness: How You’re Being Fooled by the Numbers is a beautiful polemic straight from the headquarters of the Statistical Abuse Department. Seife, whose Zero is an enjoyable, accessible story of the development and controversy of that number and concept, aims both barrels at journalists, politicians, and demagogues who misinterpret or misuse statistics, knowing that if you attach a number to something, people are more inclined to believe it.
Seife opens with Senator Joseph McCarthy’s famous claim about knowing the names of “205 … members of the Communist Party” who were at that moment working in the State Department. It was bullshit; the number kept changing, up and down, every time he gave a version of the speech, but by putting a specific number on it, the audience assumed he had those specific names. It’s a basic logical error: if he has the list of names, he must have the number, but that doesn’t mean the converse is true. He rips through a series of similarly well-known examples of public abuse of statistics, from the miscounting of the Million Man March to stories about blondes becoming extinct to Al Gore cherrypicking data in An Inconvenient Truth, to illustrate some of the different ways people with agendas can and will manipulate you with stats.
One of the best passages, and probably most relevant to us as the Presidential election cycle is beginning, is on polls – particularly on how they’re reported. Seife argues, with some evidence, that many reporters don’t understand what the margin of error means. (This subject also got some time in Ian Ayers’ Super Crunchers, a somewhat dated look at the rise of Big Data in decision-making that has since been lapped by the very topic it attempted to cover.) If done correctly, the margin of error should equal two standard deviations, but many journalists and pundits treat it as some ambiguous measure of the confidence in the reported means. When Smith is leading Jones 51% to 49% with a margin of error of ±3%, that’s not a “statistical dead heat;” that’s telling you that the poll, if run properly, says there’s a 95% chance that Smith’s actual support is between 48% and 54% and a 95% that Jones’ support is between 46% and 52%, with each distribution centered on the means (51% and 49%) that were the actual results of the poll. That’s far from a dead heat, as long as the poll itself didn’t suffer from any systemic bias, as in the famous Literary Digest poll for the 1936 Presidential election.
Seife shifts gears in the second half of the book from journalists to politicians and jurists who either misuse stats for propaganda purposes or who misuse them when crafting bad laws or making bad rulings. He explains gerrymandering, pointing out that this is an easy problem to solve with modern technology if politicians had any actual interest in solving it, and breaks down the 2000 Presidential vote in Florida and the 2008 Minnesota Senate race to show that the inevitable lack of precision even in popular votes and census-taking mean both races were, in fact, dead heats. (Specifically, he says that it is impossible to say with any confidence that either candidate was the winner.) Seife shows how bad data have skewed major court decisions, and how McCleskey v. Kemp ignored compelling data on the skewed implementation of capital punishment. (Antonin Scalia voted with the majority, part of a long pattern of ignoring data that don’t support his views, according to Seife.) This statistical abuse cuts both ways, as he gives examples of both prosecutors and defense attorneys playing dirty with numbers to claim that a defendant is guilty or innocent.
For my purposes, it’s a good reminder that numbers can be illustrative but also misleading, especially since the line between giving stats for descriptive reasons can bleed into the appearance of a predictive argument. I pointed out the other day on Twitter that both Michael Conforto and Kyle Schwarber were on short but impressive power streaks; neither run meant anything given how short they were, but I thought they were fun to see and spoke to how both players are elite offensive prospects. (By the way, Dominic Smith is hitting .353/.390/.569 in his last 29 games, and has reached base in 21 straight games!) But I’d recommend this book to anyone working in the media, especially in the political arena, as a manual for how not to use statistics or to believe the ones that are handed to you. It’s also a great guide for how to be a more educated voter, consumer, and reader, so when climate change deniers claim the earth hasn’t warmed for sixteen years, you’ll be ready to spot and ignore it.
Next up: I’m way behind on reviews, but right now I’m halfway through Adam Rogers’ Proof: The Science of Booze.