New book: Come on people, stop using p-values already

June 13, 2008 – 1:54 pm

In the June 6 issue of Science, there is a review of a new book entitled The Cult of Statistical Significance: How the Standard Error Costs Us Jobs, Justice, and Lives. This book joins a long and loud chorus calling for the end of statistical hypothesis testing (frequently manifested as the reportage of p-values), a chorus of which I have been a member [this links to a long thread so do a search for my name]. The reviewer of the book, Theodore M. Porter, writes:

The authors do not claim much originality in recognizing [the use of statistical hypothesis tests] as an error; they list and discuss a distinguished roster of predecessors in statistics, philosophy, and the sciences who have called attention to it. And yet somehow the error persists, across a wide range of disciplines including some–such as pharmaceutical regulation, econometrics, and education studies–that feed directly into policy. The book was written to shake us out of our lazy habit of treating significance levels as an almost automatic criterion of scientific and practical worth.

This quote demonstrates some interesting points about the conflict between users and disparagers of statistical hypothesis tests. As Porter acknowledges, there isn’t any question about who is right: statistical hypothesis tests have been shown to be an epistemologically bankrupt approach to data analysis and people should stop using them. The book (which I won’t read because I can guess what it says) is undoubtedly correct on this point. But there is something else in the quote above. The use of the word “lazy” in the last sentence may be a subtle dig at the authors of the book for being preachy. Porter expands on this later in the review, and although he makes it clear that he agrees with the basic premise of the book, the overall review is decidedly mixed.

Porter is not the first book reviewer to skewer an author that writes on this subject. Here’s a review of another book called Statistics Without Math. The authors of the book are part of the chorus decrying statistical hypothesis tests, although their book includes many other subjects as well. The reviewer, William Baltosser writes:

I found myself agreeing with the authors on many points, disagreeing in other instances, or wanting them to get off their soapbox (e.g., types of statistics used “trivial,” “promote cultural identity,” a “cultural badge,” “display scientific culture”) in other instances.

The soapbox comment of course refers to the discussion in Statistics Without Math on the misuse of statistical hypothesis tests among a few other practices. I won’t argue that the authors of Statistics Without Math write with a cynical and preachy tone that may be counterproductive to their cause. However, it is ironic that the phrases Baltosser finds so offensive are in fact the best explanation for the perpetuation of statistical hypothesis testing.

Statistical hypothesis testing is so ingrained in some scientific circles that even though the technique is easily shown to be both silly and confusing, failure to use it can arouse suspicion. For example, in a paper where I reported effect sizes with estimates of uncertainty instead of statistical hypothesis tests, one reviewer wrote, “Overall, the authors do not show any statistical value besides confidence intervals and the reader is not confident whether the differences between [the treatments] are significant.” One way to deal with such a comment (thus allowing your paper to be published and your career to march forward) is to acquiesce and include p-values. Thus, the review process contributes to the perpetuation of statistical hypothesis testing. In this case, I used the mountain of literature invalidating the practice to handily rebut the criticism.

Unlike Porter, Baltosser does not acknowledge that the book authors are correct on the point of eliminating statistical hypothesis tests. The reason for this conspicuous absence in the review is suggested in the final paragraph: Baltosser reveals that he has for 15 years taught “biostatistics,” which is the name ususally given the offending course that is another key perpetuator of statistical hypothesis testing. It is easy to see how Baltosser might be offended by the implication that he is a perpetuator of intellectually fraudulent methods.

Like allowing gay marriage, people will eventually come around to realize that eliminating the use of p-values and statistical hypothesis tests is the right thing to do. But until then, the issue provides a fascinating window on how something as pseudo-rigorous as statistical hypothesis testing can linger in the scientific community.

And for those of us, myself included, that get a little too worked up about this, what should we do in the meantime to calm ourselves? That’s easy, just give people copies of the great statistician John Tukey’s take on the issue. He pretty much clears the whole thing up in the first three pages of his paper “A Philosophy of Multiple Comparisons.” His analysis is straightforward, logical, and instructive. And perhaps most importantly, his tone is reflective instead of preachy.

  1. 2 Responses to “New book: Come on people, stop using p-values already”

  2. I like your gay marriage analogy. It’s heartening to think that one day scientists who refuse to perpetuate the use of faulty statistical methods may be granted the same rights guaranteed to all other citizens of this great nation. Until then, keep fightin’ the man…or the p-value! 😛

    By Jaclyn on Jun 14, 2008

  3. Cool blog! I’m going to be a frequent reader.

    I read the review in Science with much appreciation as well. Have you looked into much about likelihood for testing hypotheses? I think that this is extremely relevant, especially if you view your test as trying to reject hypotheses from a set of “candidates” and giving weight to the others. You don’t need a p-value when you are just saying that one possible hypothesis is better than another possible (not null) hypothesis.

    Tom Hobbs, a professor at CSU, has been pushing this view in his work (on ungulate behavior) and he teaches it in one of his graduate classes. He has a graph where he suggests the growing emergence of new statistical techniques (especially Bayesian statistics) in the major ecology journals.

    Like you said, any cultural change is hard. I think that ecologists and other scientists are starting to catch on, though. And it certainly helps if leading statisticians are on the same wavelength (that you shouldn’t need p-values to show your point).

    By Aaron Berdanier on Jul 6, 2008

Sorry, comments for this entry are closed at this time.