“If you use P=0.05 to suggest that you have made a discovery, you will be wrong at least 30% of the time,” David Colquhoun working at the University College London claims. Well-known British pharmacologist argues that unreflected over-reliance on p-values produces too many false claims in peer-reviewed scientific journals.
He thinks that this issue strongly contributes to deepening crisis of contemporary science. Large number of studies is not replicated successfully. Systematic failures to repeat observations suggest that something is wrong with the use of inferential procedures.
“You make a fool of yourself if you declare that you have discovered something, when all you are observing is random chance. From this point of view, what matters is the probability that, when you find that a result is ‘statistically significant’, there is actually a real effect. If you find a ‘significant’ result when there is nothing but chance at play, your result is a false positive, and the chance of getting a false positive is often alarmingly high,” Colquhom explains.
He calculated that if 100 out of 1000 identical studies will reveal significant effect, when required statistical significance threshold is 0.05 and power of the study is 0.8, then 36% of these studies will be mistaken. This disturbing result is obtained when all assumptions of statistical tests, such as normality of distribution, are met. This suggests that “any real experiment can only be less perfect than the simulations discussed here, and the possibility of making a fool of yourself by claiming falsely to have made a discovery can only be even greater than we find in this paper.”
However, author of this study admits that mathematical calculations can be debatable. “An easy way to test these problems is to simulate a series of tests to see what happens in the long run. This is easy to do and a script is supplied, in the R language. This makes it quick to simulate 100 000 t-tests (that takes about 3.5 min on my laptop). It is convincing because it mimics real life,” he says.
Interestingly, results of these simulations only confirmed that the proportion of false positives will be 36%. This means, that likelihood to report an effect, when there is none is very high. How this problem can be solved? “If you want to avoid making a fool of yourself very often, do not regard anything greater than p < 0.001 as a demonstration that you have discovered something,” the famous scholar offers.
Article: Colquhoun D., 2014, An investigation of the false discovery rate and the misinterpretation of p-values. Royal Society open science, 1: 140216., source link.