Why Most Published Research Findings Are False

"Why Most Published Research Findings Are False" is a 2005 essay written by John Ioannidis, a professor at the Stanford School of Medicine, and published in PLOS Medicine.^[1] It is considered foundational to the field of metascience.

In the paper, Ioannidis argued that a large number, if not the majority, of published medical research papers contain results that cannot be replicated. In simple terms, the essay states that scientists use hypothesis testing to determine whether scientific discoveries are significant. Statistical significance is formalized in terms of probability, with its p-value measure being reported in the scientific literature as a screening mechanism. Ioannidis posited assumptions about the way people perform and report these tests; then he constructed a statistical model which indicates that most published findings are likely false positive results.

Argument edit

Suppose that in a given scientific field there is a known baseline probability that a result is true, denoted by $\mathbb {P} ({\text{True}})$ . When a study is conducted, the probability that a positive result is obtained is $\mathbb {P} (+)$ . Given these two factors, we want to compute the conditional probability $\mathbb {P} ({\text{True}}\mid +)$ , which is known as the positive predictive value (PPV). Bayes' theorem allows us to compute the PPV as:

\mathbb {P} ({\text{True}}\mid +)={(1-\beta )\mathbb {P} ({\text{True}}) \over {(1-\beta )\mathbb {P} ({\text{True}})+\alpha \left[1-\mathbb {P} ({\text{True}})\right]}}

where

\alpha

is the type I error rate (false positives) and

\beta

is the type II error rate (false negatives); the statistical power is

1-\beta

. It is customary in most scientific research to desire

\alpha =0.05

and

\beta =0.2

. If we assume

\mathbb {P} ({\text{True}})=0.1

for a given scientific field, then we may compute the PPV for different values of

\alpha

and

\beta

:

$\alpha$	0.1	0.2	0.3	0.4	0.5	0.6	0.7	0.8	0.9
	$\beta$
0.01	0.91	0.90	0.89	0.87	0.85	0.82	0.77	0.69	0.53
0.02	0.83	0.82	0.80	0.77	0.74	0.69	0.63	0.53	0.36
0.03	0.77	0.75	0.72	0.69	0.65	0.60	0.53	0.43	0.27
0.04	0.71	0.69	0.66	0.63	0.58	0.53	0.45	0.36	0.22
0.05	0.67	0.64	0.61	0.57	0.53	0.47	0.40	0.31	0.18

However, the simple formula for PPV derived from Bayes' theorem does not account for bias in study design or reporting. Some published findings would not have been presented as research findings if not for researcher bias. Let $u\in [0,1]$ be the probability that an analysis was only published due to researcher bias. Then the PPV is given by the more general expression:

\mathbb {P} ({\text{True}}|+)={\left[1-(1-u)\beta \right]\mathbb {P} ({\text{True}}) \over {\left[1-(1-u)\beta \right]\mathbb {P} ({\text{True}})+\left[(1-u)\alpha +u\right]\left[1-\mathbb {P} ({\text{True}})\right]}}

The introduction of bias will tend to depress the PPV; in the extreme case when the bias of a study is maximized,

\mathbb {P} ({\text{True}}|+)=\mathbb {P} ({\text{True}})

. Even if a study meets the benchmark requirements for

\alpha

and

\beta

, and is free of bias, there is still a 36% probability that a paper reporting a positive result will be incorrect; if the base probability of a true result is lower, then this will push the PPV lower too. Furthermore, there is strong evidence that the average statistical power of a study in many scientific fields is well below the benchmark level of 0.8.^[2]^[3]^[4]

Given the realities of bias, low statistical power, and a small number of true hypotheses, Ioannidis concludes that the majority of studies in a variety of scientific fields are likely to report results that are false.

Corollaries edit

In addition to the main result, Ioannidis lists six corollaries for factors that can influence the reliability of published research.

Research findings in a scientific field are less likely to be true,

the smaller the studies conducted.
the smaller the effect sizes.
the greater the number and the lesser the selection of tested relationships.
the greater the flexibility in designs, definitions, outcomes, and analytical modes.
the greater the financial and other interests and prejudices.
the hotter the scientific field (with more scientific teams involved).

Ioannidis has added to this work by contributing to a meta-epidemiological study which found that only 1 in 20 interventions tested in Cochrane Reviews have benefits that are supported by high-quality evidence.^[5] He also contributed to research suggesting that the quality of this evidence does not seem to improve over time.^[6]

Reception edit

Despite skepticism about extreme statements made in the paper, Ioannidis's broader argument and warnings have been accepted by a large number of researchers.^[7] The growth of metascience and the recognition of a scientific replication crisis have bolstered the paper's credibility, and led to calls for methodological reforms in scientific research.^[8]^[9]

In commentaries and technical responses, statisticians Goodman and Greenland identified several weaknesses in Ioannidis' model.^[10]^[11] Ioannidis's use of dramatic and exaggerated language that he "proved" that most research findings' claims are false and that "most research findings are false for most research designs and for most fields" [italics added] was rejected, and yet they agreed with his paper's conclusions and recommendations.

Biostatisticians Jager and Leek criticized the model as being based on justifiable but arbitrary assumptions rather than empirical data, and did an investigation of their own which calculated that the false positive rate in biomedical studies was estimated to be around 14%, not over 50% as Ioannidis asserted.^[12] Their paper was published in a 2014 special edition of the journal Biostatistics along with extended, supporting critiques from other statisticians. Leek summarized the key points of agreement as: when talking about the science-wise false discovery rate one has to bring data; there are different frameworks for estimating the science-wise false discovery rate; and "it is pretty unlikely that most published research is false", but that probably varies by one's definition of "most" and "false".^[13]

Statistician Ulrich Schimmack reinforced the importance of the empirical basis for models by noting the reported false discovery rate in some scientific fields is not the actual discovery rate because non-significant results are rarely reported. Ioannidis's theoretical model fails to account for that, but when a statistical method ("z-curve") to estimate the number of unpublished non-significant results is applied to two examples, the false positive rate is between 8% and 17%, not greater than 50%.^[14]

Causes of high false positive rate edit

Despite these weaknesses there is nonetheless general agreement with the problem and recommendations Ioannidis discusses, yet his tone has been described as "dramatic" and "alarmingly misleading", which runs the risk of making people unnecessarily skeptical or cynical about science.^[10]^[15]

A lasting impact of this work has been awareness of the underlying drivers of the high false positive rate in clinical medicine and biomedical research, and efforts by journals and scientists to mitigate them. Ioannidis restated these drivers in 2016 as being:^[16]

Solo, siloed investigator limited to small sample sizes
No preregistration of hypotheses being tested
Post-hoc cherry picking of hypotheses with best P values
Only requiring P < .05
No replication
No data sharing

References edit

^ Ioannidis, John P. A. (2005). "Why Most Published Research Findings Are False". PLOS Medicine. 2 (8): e124. doi:10.1371/journal.pmed.0020124. ISSN 1549-1277. PMC 1182327. PMID 16060722.
^ Button, Katherine S.; Ioannidis, John P. A.; Mokrysz, Claire; Nosek, Brian A.; Flint, Jonathan; Robinson, Emma S. J.; Munafò, Marcus R. (2013). "Power failure: why small sample size undermines the reliability of neuroscience". Nature Reviews Neuroscience. 14 (5): 365–376. doi:10.1038/nrn3475. ISSN 1471-0048. PMID 23571845.
^ Szucs, Denes; Ioannidis, John P. A. (2017-03-02). "Empirical assessment of published effect sizes and power in the recent cognitive neuroscience and psychology literature". PLOS Biology. 15 (3): e2000797. doi:10.1371/journal.pbio.2000797. ISSN 1545-7885. PMC 5333800. PMID 28253258.
^ Ioannidis, John P. A.; Stanley, T. D.; Doucouliagos, Hristos (2017). "The Power of Bias in Economics Research". The Economic Journal. 127 (605): F236–F265. doi:10.1111/ecoj.12461. ISSN 1468-0297. S2CID 158829482.
^ Howick, Jeremy; Koletsi, Despina; Ioannidis, John P. A.; Madigan, Claire; Pandis, Nikolaos; Loef, Martin; Walach, Harald; Sauer, Sebastian; Kleijnen, Jos; Seehra, Jadbinder; Johnson, Tess; Schmidt, Stefan (August 1, 2022). "Most healthcare interventions tested in Cochrane Reviews are not effective according to high quality evidence: a systematic review and meta-analysis". Journal of Clinical Epidemiology. 148: 160–169. doi:10.1016/j.jclinepi.2022.04.017. PMID 35447356. S2CID 248250137 – via www.jclinepi.com.
^ Howick, Jeremy; Koletsi, Despina; Pandis, Nikolaos; Fleming, Padhraig S.; Loef, Martin; Walach, Harald; Schmidt, Stefan; Ioannidis, John P. A. (October 1, 2020). "The quality of evidence for medical interventions does not improve or worsen: a metaepidemiological study of Cochrane reviews". Journal of Clinical Epidemiology. 126: 154–159. doi:10.1016/j.jclinepi.2020.08.005. PMID 32890636. S2CID 221512241 – via www.jclinepi.com.
^ Belluz, Julia (2015-02-16). "John Ioannidis has dedicated his life to quantifying how science is broken". Vox. Retrieved 2020-03-28.
^ "Low power and the replication crisis: What have we learned since 2004 (or 1984, or 1964)? « Statistical Modeling, Causal Inference, and Social Science". statmodeling.stat.columbia.edu. Retrieved 2020-03-28.
^ Wasserstein, Ronald L.; Lazar, Nicole A. (2016-04-02). "The ASA Statement on p-Values: Context, Process, and Purpose". The American Statistician. 70 (2): 129–133. doi:10.1080/00031305.2016.1154108. ISSN 0003-1305.
^ ^a ^b Goodman, Steven; Greenland, Sander (24 April 2007). "Why Most Published Research Findings Are False: Problems in the Analysis". PLOS Medicine. 4 (4): e168. doi:10.1371/journal.pmed.0040168. PMC 1855693. PMID 17456002.
^ Goodman, Steven; Greenland, Sander. "ASSESSING THE UNRELIABILITY OF THE MEDICAL LITERATURE: A RESPONSE TO "WHY MOST PUBLISHED RESEARCH FINDINGS ARE FALSE"". Collection of Biostatistics Research Archive. Working Paper 135: Johns Hopkins University, Dept. of Biostatistics Working Papers. Archived from the original on 2 November 2018.{{cite web}}: CS1 maint: location (link)
^ Jager, Leah R.; Leek, Jeffrey T. (1 January 2014). "An estimate of the science-wise false discovery rate and application to the top medical literature". Biostatistics. 15 (1). Oxford Academic: 1–12. doi:10.1093/biostatistics/kxt007. PMID 24068246. Archived from the original on 11 June 2020.
^ Leek, Jeff. "Is most science false? The titans weigh in". simplystatistics.org. Archived from the original on 31 January 2017.
^ Schimmack, Ulrich (16 January 2019). "Ioannidis (2005) was wrong: Most published research findings are not false". Replicability-Index. Archived from the original on 19 September 2020.
^ Ingraham, Paul (15 September 2016). "Ioannidis: Making Science Look Bad Since 2005". www.PainScience.com. Archived from the original on 21 June 2020.
^ Minikel, Eric V. (17 March 2016). "John Ioannidis: The state of research on research". www.cureffi.org. Archived from the original on 17 January 2020.

External links edit

YouTube video(s) from the Berkeley Initiative for Transparency in the Social Sciences, 2016, "Why Most Published Research Findings are False" (Part I, Part II, Part III)
YouTube video of John Ioannidis at Talks at Google, 2014 "Reproducible Research: True or False?"

[1] Ioannidis, John P. A. (2005). "Why Most Published Research Findings Are False". PLOS Medicine. 2 (8): e124. doi:10.1371/journal.pmed.0020124. ISSN 1549-1277. PMC 1182327. PMID 16060722.

[2] Button, Katherine S.; Ioannidis, John P. A.; Mokrysz, Claire; Nosek, Brian A.; Flint, Jonathan; Robinson, Emma S. J.; Munafò, Marcus R. (2013). "Power failure: why small sample size undermines the reliability of neuroscience". Nature Reviews Neuroscience. 14 (5): 365–376. doi:10.1038/nrn3475. ISSN 1471-0048. PMID 23571845.

[3] Szucs, Denes; Ioannidis, John P. A. (2017-03-02). "Empirical assessment of published effect sizes and power in the recent cognitive neuroscience and psychology literature". PLOS Biology. 15 (3): e2000797. doi:10.1371/journal.pbio.2000797. ISSN 1545-7885. PMC 5333800. PMID 28253258.

[4] Ioannidis, John P. A.; Stanley, T. D.; Doucouliagos, Hristos (2017). "The Power of Bias in Economics Research". The Economic Journal. 127 (605): F236–F265. doi:10.1111/ecoj.12461. ISSN 1468-0297. S2CID 158829482.

[5] Howick, Jeremy; Koletsi, Despina; Ioannidis, John P. A.; Madigan, Claire; Pandis, Nikolaos; Loef, Martin; Walach, Harald; Sauer, Sebastian; Kleijnen, Jos; Seehra, Jadbinder; Johnson, Tess; Schmidt, Stefan (August 1, 2022). "Most healthcare interventions tested in Cochrane Reviews are not effective according to high quality evidence: a systematic review and meta-analysis". Journal of Clinical Epidemiology. 148: 160–169. doi:10.1016/j.jclinepi.2022.04.017. PMID 35447356. S2CID 248250137 – via www.jclinepi.com.

[6] Howick, Jeremy; Koletsi, Despina; Pandis, Nikolaos; Fleming, Padhraig S.; Loef, Martin; Walach, Harald; Schmidt, Stefan; Ioannidis, John P. A. (October 1, 2020). "The quality of evidence for medical interventions does not improve or worsen: a metaepidemiological study of Cochrane reviews". Journal of Clinical Epidemiology. 126: 154–159. doi:10.1016/j.jclinepi.2020.08.005. PMID 32890636. S2CID 221512241 – via www.jclinepi.com.

[7] Belluz, Julia (2015-02-16). "John Ioannidis has dedicated his life to quantifying how science is broken". Vox. Retrieved 2020-03-28.

[8] "Low power and the replication crisis: What have we learned since 2004 (or 1984, or 1964)? « Statistical Modeling, Causal Inference, and Social Science". statmodeling.stat.columbia.edu. Retrieved 2020-03-28.

[9] Wasserstein, Ronald L.; Lazar, Nicole A. (2016-04-02). "The ASA Statement on p-Values: Context, Process, and Purpose". The American Statistician. 70 (2): 129–133. doi:10.1080/00031305.2016.1154108. ISSN 0003-1305.

[Goodman-1-10] Goodman, Steven; Greenland, Sander (24 April 2007). "Why Most Published Research Findings Are False: Problems in the Analysis". PLOS Medicine. 4 (4): e168. doi:10.1371/journal.pmed.0040168. PMC 1855693. PMID 17456002.

[Goodman-2-11] Goodman, Steven; Greenland, Sander. "ASSESSING THE UNRELIABILITY OF THE MEDICAL LITERATURE: A RESPONSE TO "WHY MOST PUBLISHED RESEARCH FINDINGS ARE FALSE"". Collection of Biostatistics Research Archive. Working Paper 135: Johns Hopkins University, Dept. of Biostatistics Working Papers. Archived from the original on 2 November 2018.{{cite web}}: CS1 maint: location (link)

[Leek-1-12] Jager, Leah R.; Leek, Jeffrey T. (1 January 2014). "An estimate of the science-wise false discovery rate and application to the top medical literature". Biostatistics. 15 (1). Oxford Academic: 1–12. doi:10.1093/biostatistics/kxt007. PMID 24068246. Archived from the original on 11 June 2020.

[Leek-2-13] Leek, Jeff. "Is most science false? The titans weigh in". simplystatistics.org. Archived from the original on 31 January 2017.

[14] Schimmack, Ulrich (16 January 2019). "Ioannidis (2005) was wrong: Most published research findings are not false". Replicability-Index. Archived from the original on 19 September 2020.

[15] Ingraham, Paul (15 September 2016). "Ioannidis: Making Science Look Bad Since 2005". www.PainScience.com. Archived from the original on 21 June 2020.

[Minikel-16] Minikel, Eric V. (17 March 2016). "John Ioannidis: The state of research on research". www.cureffi.org. Archived from the original on 17 January 2020.

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

[9]

[10]

[11]

[12]

[13]

[14]

[15]

[16]