Evidence or Coincidence? Insights Into Statistics and the Scientific Method

We are constantly asking questions like:

  • Is this vaccine effective in preventing the flu?
  • Is this medication effective in treating COVID-19?
  • Does this substance cause cancer?

 

Clinical practice is grounded in scientific data and research findings. For medicine to evolve, the scientific community continuously asks relevant questions and conducts studies to answer them.1

 

Every study begins with a question! Researchers then formulate a hypothesis to be tested—for example, whether a specific treatment is more effective than existing ones. From there, they establish objectives, define the characteristics of participants (study population), determine how outcomes will be evaluated (e.g., which tests or assessments will be conducted), and finally use statistics to assess the probability that the hypothesis reflects a true effect.1,2

 

This article will explore some statistical concepts and scientific methods used to answer questions, distinguishing real data from mere coincidences.

 

Correlation and Causation

 

The term “correlation” is often used in everyday language to describe some form of association, such as the arrival of cold weather and the increase in respiratory issues.

 

Statistically, correlation refers to an association between two quantitative variables (characteristics that can be measured, such as weight and height). It also assumes a linear association, meaning one variable increases or decreases by a fixed value as the other increases or decreases (Figure 1).3

Figure 1. Illustration showing a linear association (A) and uncorrelated data (B). Adapted from The BMJ. Correlation and regression.3

 

Correlation only measures association; it cannot determine cause and effect.4

 

Causality, on the other hand, remains a topic of ongoing debate in the scientific community. Simply put, the cause of an event can be defined as a condition or characteristic that existed before the event and was necessary for it to occur. In other words, to infer causation, the condition must precede the outcome, and the outcome must not occur without that condition.5

 

It’s possible to find associations that lack causality. Some associations may occur purely by chance—these are false associations.6,7 Below is a humorous example to illustrate this point:

 

Nicolas Cage and U.S. Drowning Cases

When comparing random data sets, it’s possible to find high associations purely by chance.7 

A fascinating example emerged a few years ago: a correlation between “the number of people who drowned in swimming pools in the U.S.” and “the number of films starring Nicolas Cage released from 1999 to 2009” (Figure 2). Interestingly, drowning cases seemed to follow the number of Nicolas Cage movies.7 However, despite the correlation, we cannot conclude that the actor is responsible for the drownings!

Figure 2. Correlation between the number of drownings in the US and Nicolas Cage’s movies. Adapted from Vigen T. Spurious correlations.8

These coincidences can also occur due to unaccounted-for variables that explain the observed association. Such variables are called confounding or “hidden” variables.7

For instance, suppose a study finds that people who drink a lot of coffee have a lower risk of skin cancer. This doesn’t necessarily mean coffee has protective properties against cancer! An alternative explanation might be that heavy coffee drinkers spend long hours indoors and, therefore, have less sun exposure—a known risk factor for skin cancer. In this case, reduced outdoor exposure is a confounding variable, common to both coffee consumption and skin cancer risk (Figure 3).6

In short, observing an association suggests a hypothesis but does not provide evidence that one variable causes the other.6

Figure 3. Examples of associations between data and true/false causalities. Adapted from Altman N e Krzywinski M. Nat Methods 2015;12(10):899–900.6 Chang M. Educ Psychol Meas 2017;77(3):475–88.9).

To infer that one variable causes another, there are scientific methods and statistical tests necessary to do so.4

Conditions for Causality

All events result from previous events, which were caused by other events, and so on. Causal relationships between events require three conditions:9

  1. The “cause event” precedes the “effect event” in time.
  2. The factor isolation law is true: if X, then Y; if not X, then not Y.
  3. The relationship is verifiable, meaning condition 2 persists over time and the events in condition 1 are repeatable. Repetition allows causality to inform predictions.

For example, we know that SARS-CoV-2 is the causative agent of COVID-19.10 So:

  1. Viral infection precedes the disease (COVID-19) in time.
  2. If a person is infected (and susceptible), they will develop the disease; if not infected, they won’t get COVID-19.
  3. Numerous individuals infected with SARS-CoV-2 developed the disease.

 

Statistical Significance

When researchers face a scientific question, they use existing data to hypothesize and test it statistically.

Statistical significance assesses whether study findings are likely due to chance or represent real patterns in the data.2

When results are statistically significant, the null hypothesis (e.g., no difference between two groups) is rejected, suggesting an actual difference. For instance, if experimental and control group responses are identical, we cannot reject the null hypothesis. Thus, within the study’s criteria, there seems to be no difference.11

 

The P-Value

Most statistical tests conclude with a P-value calculation.12
,13 

 

The P-value represents the probability of observing an effect as extreme as the one found, assuming the null hypothesis is true. Lower P-values indicate results are less consistent with the null hypothesis.12

But how can we understand this in practice?

Imagine a randomized clinical trial that compares a new antidepressant drug with a placebo. At the end of the study, it was shown that 60% of patients treated with the new antidepressant and 40% of those treated with placebo had a good response; the calculated P-value is 0.03.13

 

What, then, is the appropriate interpretation of this result? Imagine that the null hypothesis is true; that is, the new antidepressant is no different from placebo. Now, if you were to conduct a hundred randomized controlled trials comparing the drug to placebo, you would certainly not get an identical response rate for the drug and placebo in each RCT. Instead, in some RCTs, the drug would outperform the placebo, and in others, the placebo would outperform the drug. Furthermore, the magnitude by which the drug and placebo outperformed each other would vary from trial to trial.13 

In this context, what P = 0.03 (i.e., 3%) means is that if the null hypothesis is true, and if you ran the study a large number of times in exactly the same way, then on 3% of occasions you would get the same or a larger difference between the groups than you got on this occasion.13

However, you need a criterion for defining statistical significance! 𝛂 is that criterion. It is a probability that we accept of considering H0 false when, in fact, it is true. In other words, it is the maximum error that we are willing to accept. It is set arbitrarily. For example, α = 5%. If 𝛂 = 0.05 and P = 0.03 (i.e., P is less than 𝛂 and further from H0), then statistical significance is reached. If 𝛂 = 0.01 and P = 0.03, statistical significance is not reached. Intuitively, if the P value is less than the pre-specified 𝛂, then the data suggest that the study result is so rare that it does not appear to be consistent with H0, leading to the rejection of H0. For example, if the P value is 0.001, this indicates that, if the null hypothesis is indeed true, there would be only a 1 in 1,000 chance of observing data at this extreme. Therefore, either very unusual data were observed or the assumption about the truth of H0 is incorrect. Thus, small P values ​​(smaller than 𝛂) lead to rejection of H0 in favor of an H1 of some effect (e.g., effect of some treatment).11,12

 

Why 5%?

 

Virtually all healthcare professionals are familiar with the expression “P<0.05” as a cutoff that indicates “statistical significance.”13

 

For decades, 0.05 (5%, or 1 chance in 20) was conventionally accepted as the threshold for discriminating significant from insignificant results, inadequately translated into differences or phenomena that exist from those that do not. In practice, in a normal distribution curve, while 95% of the area under the curve falls between +2 and -2 standard deviations from the center, 5% would be at the tails of the curve (Figure 4).14 This means that approximately 5% of the normal distribution comprises outlying or “significantly different” values, that is, values ​​that are more than two standard deviations away from the mean.11 

Thus, a P value of less than 0.05 (i.e., 5%) means that if the null hypothesis (H0) is true, and if you perform the study a large number of times in exactly the same way, then on 5% of the occasions you would obtain an equal or greater difference between the groups than you obtained on this occasion. This is something so rare that we can consider that H0 has a high chance of being incorrect!13

 

Figure 4. Example of a normal distribution curve. Adapted from Di Leo G et al. Eur Radiol Exp 2020;4(1):18.14

Conclusions

 

A correlation between two variables is a measure of association, but it does not indicate a causal relationship. This requires randomized controlled trials or other statistical methods.4

When researchers want to know the answer to a scientific question, they create a hypothesis to be tested in a study.2

To this end, the P value and hypothesis testing theory are useful tools that help to plan an experiment, interpret the observed results, and report the findings to colleagues. However, it is essential that these tools are understood! In this way, interpretations and conclusions about the results are made based on plausible scientific premises and not just on the isolated evaluation of statistical analyses.12 And so, evidence brings us closer to scientific truths!

 

How to cite this article:

KACHI. Evidence or Coincidence? Insights Into Statistics and the Scientific Method. São Paulo: KACHI Comunicação Científica, 24 Jun 2025. Available at: https://kachi.com.br/en/evidence-or-coincidence-insights.

 

Bibliographic References:

  1. Farrugia P, Petrisor BA, Farrokhyar F, Bhandari M. Practical tips for surgical research: Research questions, hypotheses and objectives. Can J Surg 2010;53(4):278–81.
  2. Miller J. Hypothesis Testing in the Real World. Educ Psychol Meas 2017;77(4):663–72.
  3. The BMJ. Correlation and regression [Internet]. [cited 2020 Aug 31];Available from: https://www.bmj.com/about-bmj/resources-readers/publications/statistics-square-one/11-correlation-and-regression
  4. Hung M, Bounsanga J, Voss MW. Interpretation of correlations in clinical research. Postgrad Med 2017;129(8):902–6.
  5. Rothman KJ, Greenland S. Causation and Causal Inference in Epidemiology. Am J Public Health 2005;95(S1):S144–50.
  6. Altman N, Krzywinski M. Association, correlation and causation. Nat Methods 2015;12(10):899–900.
  7. Keogh B, Monks T. The impact of delayed transfers of care on emergency departments: common sense arguments, evidence and confounding. Emerg Med J 2020;37(2):95–101.
    Vigen T. Spurious correlations [Internet]). [cited 2020 Aug 31];Available from: https://www.tylervigen.com/spurious-correlations
  8. Chang M. What Constitutes Science and Scientific Evidence: Roles of Null Hypothesis Testing. Educ Psychol Meas 2017;77(3):475–88.
  9. Tay MZ, Poh CM, Rénia L, MacAry PA, Ng LFP. The trinity of COVID-19: immunity, inflammation and intervention. Nat Rev Immunol 2020;20(6):363–74.
  10. Palesch YY. Some common misperceptions about P values. Stroke 2014;45(12):e244–6.
    Biau DJ, Jolles BM, Porcher R. P value and the theory of hypothesis testing: an explanation for new researchers. Clin Orthop Relat Res 2010;468(3):885–92.
  11. Andrade C. The Value and Statistical Significance: Misunderstandings, Explanations, Challenges, and Alternatives. Indian J Psychol Med 2019;41(3):210–5.
  12. Di Leo G, Sardanelli F. Statistical significance: p value, 0.05 threshold, and applications to radiomics-reasons for a conservative approach. Eur Radiol Exp 2020;4(1):18.