In many fields of research right now, scientists collect data until they see a pattern that appears statistically significant, and then they use that tightly selected data to publish a paper. Critics have come to call this p-hacking, and the practice uses a quiver of little methodological tricks that can inflate the statistical significance of a finding. As enumerated by one research group, the tricks can include:
- “conducting analyses midway through experiments to decide whether to continue collecting data,â€
- “recording many response variables and deciding which to report postanalysis,â€
- “deciding whether to include or drop outliers postanalyses,â€
- “excluding, combining, or splitting treatment groups postanalysis,â€
- “including or excluding covariates postanalysis,â€
- “and stopping data exploration if an analysis yields a significant p-value.â€
Add it all up, and you have a significant problem in the way our society produces knowledge.