A very dangerous type of bias in scientific research is known as p-hacking or data dredging, which occurs when researchers collect or select data or statistical analysis until non-significant results become significant.
The results of a study can be analyzed in a number of ways, and p-hacking refers to a practice in which researchers select the analysis that produces a satisfactory result. The p refers to the p-value, a statistical entity that is essentially a measure of how surprising the results of a study would be if the effect you are looking for was not there. P would be the probability that a finding or hypothesis is the result of chance.
The effects of P-hacking
The first to detect this manipulation were psychologists Uri Simonsohn, Joseph simmons and Leif nelson who defined the concept of “p-hacking”, showing that by selecting the data to be considered and adapting the sample size it was possible to alter the “p” value of a hypothesis.
When enough hypotheses are tested, it is almost certain that some will be considered statistically significant (although this is misleading), since almost all data sets with some degree of randomness are likely to contain (for example) some false correlations. If you are not cautious, researchers using data mining techniques can easily be misled by these results.
Steven Pinker, in his latest book Rationality, gives an example of this:
Imagine a scientist conducting a painstaking experiment and obtaining data that is the opposite of “eureka.” Before abandoning the experiment, you may be tempted to ask yourself whether the effect actually occurs, but only for men, or only for women, or if anomalous data from participants who had been distracted are rejected, or if you exclude Trump’s crazy years, or if you switch to a statistical test that looks at the ranking of the data rather than its values to the last decimal place.
Due to p-hacking, as well as other research biases, such as the Texas sniper fallacy (committed when differences in data are ignored, but similarities are overemphasized), many studies end up being wrong or simply cannot be replicated.
A replicability crisis that has shaken fields such as epidemiology, social psychology, human genetics and others, or that makes us put, for example, the cry in the sky when it comes to demonizing the screens: