Descriptive statistics summarize the shape, center, and spread of a dataset before any inferential testing begins. The mean and median tell you where the data sits, but the variance, skewness, kurtosis, confidence interval, and outlier diagnostics are what let you decide whether the summary is trustworthy and which subsequent analyses are appropriate. The sections below cover the four decisions every analyst makes when reaching for descriptive stats: choosing between sample and population formulas, interpreting confidence intervals correctly, reading skewness and kurtosis as distribution diagnostics, and picking the right outlier rule for your data.
When to use sample vs. population formulas
The single most common mistake in introductory statistics is mixing up the sample and population formulas for variance and standard deviation. The sample formula divides by n − 1 (Bessel's correction); the population formula divides by n. Use the population version when your data covers the entire group you care about — every employee on a payroll, every transaction in a fiscal year, every student on a roster. Use the sample version whenever your data is a subset and you want to draw conclusions about a larger group.
Why n − 1? Because the sample mean is itself an estimate, the sum of squared deviations from the sample mean is slightly smaller than the sum of squared deviations from the unknown true mean. Dividing by n − 1 compensates for this bias. The correction is barely noticeable when n is large (n = 100 versus n = 99 changes the answer by 1%), but with small samples it matters: a sample of 5 values gives 25% higher variance under the sample formula than the population formula. When unsure, default to sample — it is the conservative choice when your data was collected from a process or population.
How confidence intervals actually work
A 95% confidence interval for the mean is a range that — over many repeated samples drawn from the same population — would contain the true mean about 95% of the time. It is computed as x̄ ± t × (s / √n), where t comes from the Student t-distribution at the chosen confidence level and df = n − 1. For large samples (n ≥ 30), the t-value is essentially the z-value: 1.645 for 90%, 1.960 for 95%, 2.576 for 99%. For small samples, t is larger to compensate for the extra uncertainty in estimating the population SD from a small sample.
A common misinterpretation is to say the true mean has a 95% chance of being inside any one CI you compute. That is not what the procedure guarantees — once you have computed a specific interval, the true mean is either in it or not. What the procedure guarantees is that the long-run rate of intervals containing the true mean is 95%. For practical purposes the two interpretations lead to similar decisions, but Bayesian credible intervals (which do support the 'probability the mean is in this range' interpretation) require a different framework.
Raising the confidence level widens the interval. 99% CIs are roughly 30% wider than 95% CIs at the same sample size — the price of more confidence is less precision. The other lever is sample size: doubling n shrinks the CI by a factor of √2 ≈ 1.41.
Skewness and kurtosis as distribution diagnostics
Skewness measures asymmetry. A skewness near zero indicates the distribution is roughly symmetric around the mean — a good sign that the mean and median agree. Positive skewness signals a long right tail (the mean is pulled above the median by extreme high values); negative skewness signals a long left tail. As a rule of thumb, |skew| < 0.5 is symmetric enough that the mean is a reasonable measure of center; |skew| > 1 suggests the median is a better summary.
Excess kurtosis measures tail heaviness compared to a normal distribution. Positive excess kurtosis (leptokurtic) means the distribution produces extreme values more often than a bell curve — common in financial returns, network latency, and reaction times. Negative excess kurtosis (platykurtic) means the distribution has lighter tails than normal — uniform distributions are an extreme example. Excess kurtosis near zero is consistent with normality but does not prove it; for a formal check, look at a Q-Q plot or run a Shapiro-Wilk test.
Both statistics are sensitive to outliers, especially with small samples. A single extreme value can flip a near-symmetric distribution into one with skewness > 1. Always inspect the histogram alongside these summaries — the numbers are diagnostic, not definitive.
Which outlier rule to use
The 1.5×IQR rule (Tukey's fences) flags any value below Q1 − 1.5×IQR or above Q3 + 1.5×IQR. It is the right default for two reasons. First, it uses quartiles, which are unaffected by extreme values — so the rule does not let outliers mask other outliers. Second, it works on skewed data because the fences adjust to the data's shape rather than assuming symmetry.
The z-score rule (|z| ≥ 2 or 3) is more familiar but has two limitations. It assumes the data is approximately normal: under that assumption, only about 5% of values should have |z| ≥ 2 and 0.3% should have |z| ≥ 3, so anything above those thresholds is suspicious. On heavily skewed data, however, the rule misfires — half the values in a log-normal distribution can be 'outliers' by the z-score definition. The second limitation is more subtle: a few extreme outliers inflate the mean and SD, making moderate outliers harder to detect (called the masking problem).
This calculator reports both rules so you can see when they agree and when they diverge. When they agree, you have a strong case to investigate or remove the flagged values. When they disagree — typically the IQR rule flags more values on skewed data — prefer the IQR rule and consider whether your data needs a log transform before further analysis. Whatever you do, always document outlier handling: removing values is a research decision that affects every downstream statistic.