The p-value is the most used (and most misunderstood) concept in statistics. It appears in virtually every scientific paper, A/B test report, and clinical trial. Yet surveys consistently show that even researchers frequently misinterpret what it means. Getting this right is essential for anyone making data-driven decisions.
What a P-Value Actually Measures
P-value = Probability of observing results at least as extreme as the data, assuming the null hypothesis is true. It does NOT tell you the probability that your hypothesis is true. It tells you how surprising your data would be if nothing interesting were actually happening.
Example: You test whether a new drug lowers blood pressure. The null hypothesis is that the drug has no effect. You run a trial and find that patients on the drug had 8 mmHg lower blood pressure than the control group, with p = 0.02. This means: if the drug truly had no effect, there is only a 2% chance of observing an 8 mmHg (or greater) difference by random chance alone.
The 0.05 Threshold
By convention, p < 0.05 is considered "statistically significant." This threshold, chosen by Ronald Fisher in the 1920s, means you are willing to accept a 5% chance of being wrong when you reject the null hypothesis (a Type I error). It is not a law of nature — it is a convention that balances rigor with practicality.
| P-Value | Interpretation | Action |
|---|---|---|
| p < 0.001 | Very strong evidence against null | Reject null with high confidence |
| p < 0.01 | Strong evidence | Reject null |
| p < 0.05 | Moderate evidence (conventional threshold) | Reject null |
| p = 0.05–0.10 | Weak evidence ("marginally significant") | Inconclusive; more data needed |
| p > 0.10 | Insufficient evidence | Fail to reject null |
Common Misinterpretations
- Wrong: "P = 0.03 means there is a 3% chance the null hypothesis is true." Right: It means there is a 3% chance of seeing this data (or more extreme) if the null hypothesis were true. The distinction is subtle but critical.
- Wrong: "P = 0.03 means there is a 97% chance the drug works." Right: The p-value says nothing about the probability of your hypothesis being true. That requires Bayesian analysis and prior probabilities.
- Wrong: "P = 0.06 means the result is not significant, so there is no effect." Right: P = 0.06 means the evidence is not strong enough to reject the null at the 0.05 level. Absence of evidence is not evidence of absence. More data might produce a definitive result.
Calculate p-values for your own data with the P-Value Calculator.
Key Takeaways
- P-value measures surprise — how unlikely your data is under the null hypothesis.
- P < 0.05 is the conventional significance threshold, but it is a guideline, not a rule.
- Statistical significance ≠ practical significance. A huge sample can produce p < 0.001 for a trivially small effect.
- Always report effect size alongside p-values to convey whether the result matters in practice.
- "Fail to reject the null" is not the same as proving the null is true.
Frequently Asked Questions
What does p < 0.05 mean in simple terms?
It means that if there were truly no effect or difference, there is less than a 5% chance of getting results as extreme as what you observed. Since 5% is considered unlikely enough, you conclude that the null hypothesis (no effect) is probably wrong and reject it. Think of it as: your data would be surprisingly rare if nothing interesting were happening.
Can a result be statistically significant but not practically important?
Yes, this happens frequently with large sample sizes. A study of 100,000 people might find that a supplement raises IQ by 0.3 points with p < 0.001. The result is highly statistically significant but completely meaningless in practice. Always look at the effect size (how big the difference is) alongside the p-value.
What is the difference between one-tailed and two-tailed p-values?
A two-tailed test checks for any difference (higher or lower), while a one-tailed test only checks in one direction. Two-tailed tests are more conservative and are the default in most research. One-tailed tests are appropriate only when you have a strong theoretical reason to predict the direction of the effect before collecting data.