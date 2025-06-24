Imagine you want to know if it's sunny today, June 24, 2025, but you decide not to look out the window. Instead, you look up some data online that says, " On average, it's sunny 60% of the time in June at this location. " So you conclude it's probably sunny, without checking the actual weather or taking into account a cloud that might be passing by right now. This is a bit like how statistics are sometimes used in medicine: we rely on average figures or complicated tools without looking closely at reality, and this can lead to absurd decisions.

This article, inspired by the articles of Professor Sander Greenland illustrated in a series of tweets by Fred Stalder , will explain these errors with simple examples , and show why they pose a problem, especially when we decide on the health of millions of people.

Why can we be deceived by statistics

In medicine, researchers use tools like p-values, a measure used to assess the strength of evidence in statistical tests, and " confidence intervals " to tell us whether a treatment works or a disease is dangerous . But these tools aren't magic. They rely on assumptions, like the idea that the data are perfect and no one cheats . Professor Sander Greenland, an epidemiologist and statistician at the University of California, Los Angeles, and an expert in statistics, says these assumptions are often wrong .

As Greenland writes:

“ Statistical methods are full of implicit assumptions whose violation can have deadly consequences when used to establish policy .” These assumptions include the idea that data are generated, managed, analyzed, and reported with absolute impartiality, competence, and integrity—conditions rarely found in reality. For example, he tells the story of 21 studies on drugs that didn’t even exist, yet were published and used for years (Rubinstein, 2009).

It shows that numbers can lie if the data is poorly collected or manipulated .

Fred Stalder, in his tweets, also criticizes scientists who ask to " trust " science, like the physicist Étienne Klein in a video . But if the tools they use are biased , this trust can be dangerous . For example, during Covid-19, some studies concluded that treatments like hydroxychloroquine did not work , simply because their p-values were not significant, but these studies ignored errors in the data, such as inappropriate or even toxic dosages in Recovery , poorly chosen patients or administration times contrary to recommendations.

Let's illustrate with concrete examples

The story of the sun and averages : let's take the example of the sun again. If you rely on the average of 60% sun in June, you could go out without an umbrella, even if it's raining today. In medicine, it's the same: a study may say that a vaccine reduces the risk of disease by 50% on average, but if the data come from very different groups (young and old, healthy and sick), this average hides the truth. Maybe the vaccine works well for the old, but not at all for the young. Or we look at the relative risks which give high ratios (90-95%) which have a positive psychological effect of effectiveness , while the absolute risk is low. Without looking at the details, we make a decision that suits no one.

Drowning out the good students : Imagine a class with some brilliant students, but also a lot of average students. If we calculate the average grades, the excellent results of the best students are " drowned out " and go unnoticed. In medicine, this is how we can miss treatments that work for a small group of patients. For example, a study on a drug might show that it has " no effect " on average, because it only works for 10% of people. If we stop at the average, we throw away a useful treatment and fail to see the " excellent students " who benefit from it.

Deciding without looking outside : During Covid-19, decisions like lockdowns were based on statistical models that predicted mass deaths. But these models ignored simple facts, like the fact that many people recovered naturally or that hospitals had heterogeneous capacities. It's like deciding that it's raining everywhere in France because it's raining in Paris, without checking Bordeaux or Marseille. The result: measures that are too harsh for some, useless for others. This could also be observed in the fraudulent Pradelle Lega study , now retracted, which claimed that 17,000 people had died from taking hydroxychloroquine without being able to show the onset of death in the countries considered. Or even by using a risk factor from the United Kingdom without looking at whether this risk factor applied to the country!

Penrose Torus - Pixabay

The aberrations of decisions based on bad statistics

These mistakes lead to absurd choices that can cost lives or money : Forced treatment that doesn't work: If a study says a drug is "effective" because its p-value is low, but the data is rigged (for example, by hiding side effects), thousands of patients may take it without benefit, or even with risks. During Covid-19, some treatments like vaccines were pushed when early treatment trials were poorly done or based on non-inferiority tests rather than absolute values of risk reduction.

Ignoring exceptions : By relying on averages, we can ignore cases where a treatment saves lives. For example, a cancer study might say that a drug " has no effect " on average, but it might cure 5% of the most seriously ill patients. Without digging deeper, we're depriving these patients of an opportunity.

Unnecessary lockdowns : Statistical models predicted mass deaths without accounting for local differences. As a result, cities with few cases were locked down, disrupting lives for no reason, while other regions lacked support.

Optical illusion -Pixabay



Why we make these mistakes: a probleme of education

The real problem is that many researchers don't really understand statistics. They learn to push buttons in software like Excel or R, which spit out p-values or confidence intervals, without knowing what they mean. Greenland says that even with good training, people fall into traps like thinking a small p-value proves something certain . But if the data are bad or the assumptions are wrong, those numbers are worthless.

As Greenland writes, " Misunderstandings about p-values, alpha levels, and confidence intervals stem from innate human cognitive biases, which even the most advanced mathematical training seems unable to correct ." This observation points to a systemic problem: statistics education is often superficial, focused on mechanical application rather than critical understanding.

Inadequate training - In many university programs, statistics courses for non-statisticians focus on the application of standard tests (t-tests, ANOVA, regressions) without emphasizing the underlying assumptions, such as the independence of observations or the absence of bias in data collection. For example, a p-value < 0.05 is often taught as proof of " significance ," without explaining that it only measures divergence under specific conditions . This approach produces users who apply methods without understanding their limitations . The " off-the-shelf " culture— the advent of modern statistical software—has amplified this problem. These tools, while useful, promote a "black box" approach where researchers generate p-values and confidence intervals without thinking about the underlying models . For example, a researcher may select covariates in a regression based on statistical results (a practice called data dredging) without documenting this process, thus skewing interpretations. This mechanization of statistical analysis, encouraged by a lack of critical training, contributes to the reproducibility crisis. A lack of continuing education – even experienced researchers often lack opportunities to update their statistical skills. More nuanced approaches, such as sensitivity analyses or Bayesian methods, are rarely taught in continuing education courses. This leaves researchers reliant on outdated paradigms, such as the p < 0.05 threshold, which, as Greenland notes, was lamented by Fisher himself for its overuse. Consequences for research and policy - This limited understanding of statistics is not limited to researchers. Policymakers, journalists, and the public, often lacking statistical training, rely on misinterpreted conclusions. In 2020, the article " Covid-19: Medicine biased by statistics " illustrated this problem in the context of Covid-19 , where studies with low p-values were used to justify public policies , without considering contextual uncertainties, such as bias in participant selection or conflicts of interest.

Statistics can be like optical illusions if they are not calibrated to reality - Pixabay

How to avoid these pitfalls

To stop making mistakes with statistics, here are some simple ideas:

Look at the reality, not just the numbers : As with the sun, check the details. A study must show who was tested and under what conditions, not just an average.

: As with the sun, check the details. A study must show who was tested and under what conditions, not just an average. Don't ignore the exceptions : If a treatment works for a few patients, it should be noted, even if the average is low. For example, test groups (young, old) separately to see who really benefits.

: If a treatment works for a few patients, it should be noted, even if the average is low. For example, test groups (young, old) separately to see who really benefits. Learning to doubt : Researchers must be trained to question their tools. Simple lessons could teach them that p < 0.05 doesn't mean " it's true ," but " it deserves a second look ."

: Researchers must be trained to question their tools. Simple lessons could teach them that p < 0.05 doesn't mean " it's true ," but " it deserves a second look ." Working with experts : Adding statisticians or scientists from other fields to medical teams to avoid errors.

: Adding statisticians or scientists from other fields to medical teams to avoid errors. Data sharing : publishing all raw data for others to verify, like a weather forecast open to all.

: publishing all raw data for others to verify, like a weather forecast open to all. Reform statistical education : University curricula should include courses on the epistemology of statistics , emphasizing the limitations of p-values and confidence intervals, as well as the importance of causal context. For example, teach how implicit assumptions (such as the absence of selection bias) condition results.

To reduce or raise awareness of cognitive biases, it is necessary to include modules on dichotomania, nullism, and statistical reification to raise students' awareness of the pitfalls of statistical interpretation.

The interdisciplinary approach becomes key and must be encouraged by promoting collaboration between statisticians and medical researchers to ensure well-designed and interpreted analyses.

Codification of case studies in order to understand the difference between modeling, analysis and observation of reality by using real examples, such as biased studies during the Covid-19 crisis, to illustrate the consequences of misuse of statistics .

It therefore becomes key to offer workshops for experienced researchers , focused on modern methods such as sensitivity analyses or Bayesian approaches, to fill the gaps in their initial training.

*** Conclusion ***

Sander Greenland's criticisms and the concerns raised in the 2020 FranceSoir article converge on an alarming conclusion : statistical tools, while indispensable, are often misunderstood and misused, biasing medical research and its applications . This problem is compounded by inadequate statistics education, which produces mechanical users who hide behind terms like "significance" or "confidence" without understanding their limitations. By adopting more cautious terminology, emphasizing causal context, recognizing the limitations of statistics, and reforming their teaching, the scientific community can strengthen the reliability of its conclusions.

As Greenland points out, it is time to "deglamorize" statistics and return them to their true role: imperfect summaries of the evidence, which only make sense in light of a deep understanding of the real context and rigorous training. These errors have led to absurd decisions, especially during crises like Covid-19 .