Descriptive Statistics Calculator

Bin Range	Count	Relative %	Cumulative %
Enter data to compute

Value	Z-Score	\|z\|	Flag
Enter data to compute

1

Paste your data

Type or paste numeric values into the textarea. The parser accepts commas, spaces, semicolons, or newlines — mix them freely. Use a preset chip to load a ready-made dataset (test scores, stock returns, or heights).

2

Choose Sample or Population and a CI level

Pick Sample (divides by n − 1) if your data is drawn from a larger population, or Population (divides by n) if you have the whole population. Pick 90%, 95%, or 99% for the confidence interval — the critical value updates instantly.

3

Read the stats and explore outliers

The Calculator tab shows mean as the hero plus a full 12-card stats grid. Frequency Distribution renders an adjustable-bin histogram. Outlier Analysis compares the 1.5×IQR rule against the z-score rule and lists per-value z-scores so you can spot extremes at a glance.

Sample vs. Population Variance

s² = Σ(x − x̄)² / (n − 1) | σ² = Σ(x − x̄)² / n

Sample variance divides by n − 1 (Bessel's correction) because the sample mean is itself an estimate; population variance divides by n. Standard deviation is the square root of variance in both cases.

Skewness (Sample)

g₁ = [n / ((n − 1)(n − 2))] × Σ((x − x̄) / s)³

Positive g₁ means a long right tail (mean > median); negative means a long left tail. |g₁| < 0.5 is roughly symmetric.

Excess Kurtosis (Population)

g₂ = (1/n) × Σ((x − x̄) / σ)⁴ − 3

Measures tail heaviness vs. a normal distribution. g₂ > 0 means heavier tails (leptokurtic), g₂ < 0 means lighter tails (platykurtic). Normal distributions have g₂ ≈ 0.

Confidence Interval for the Mean

CI = x̄ ± t(α/2, n−1) × s / √n

The t-critical value depends on the chosen confidence level (90/95/99) and degrees of freedom n − 1. For n ≥ 30 the t value converges to the z value (1.645, 1.960, 2.576).

Outlier Detection

IQR rule:  x < Q1 − 1.5·IQR   or   x > Q3 + 1.5·IQR
Z-score rule:  |z| ≥ 2 (moderate) or |z| ≥ 3 (extreme)

The IQR rule is robust against skewed data; the z-score rule assumes approximately normal data. Reporting both gives a fuller picture.

Coefficient of Variation (CV) Standard deviation expressed as a percentage of the mean (s / |x̄| × 100%). A scale-free measure of relative dispersion useful for comparing spread across datasets with different units or magnitudes.

Standard Error of the Mean (SE) The standard deviation of the sampling distribution of the mean, computed as s / √n. Smaller SE means your sample mean is a more precise estimate of the population mean.

Confidence Interval A range around the sample mean that — over many repeated samples — would contain the true population mean a given percentage of the time. A 95% CI does not say the true mean has a 95% probability of being in the interval; it describes the long-run behavior of the procedure.

Skewness A measure of asymmetry. Right-skewed (positive) distributions have a long right tail with mean > median; left-skewed distributions are the mirror image. Income, house prices, and reaction times are typically right-skewed.

Excess Kurtosis Kurtosis minus 3. Quantifies how heavy the tails are relative to a normal distribution. Leptokurtic distributions (positive excess kurtosis) produce outliers more frequently than a bell curve; platykurtic (negative) distributions produce fewer.

Z-Score The number of standard deviations a value sits from the mean: z = (x − x̄) / s. Useful for standardizing values and identifying how unusual a particular observation is.

Quartile A value that splits the sorted data into quarters. Q1 (25th percentile) marks the lower quarter, Q2 (median) the half, and Q3 (75th percentile) the upper quarter.

Interquartile Range (IQR) Q3 minus Q1 — the spread of the middle 50% of the data. Unlike the range (max − min), the IQR is robust against extreme values.

Outlier An observation that lies an abnormal distance from other values. Two common rules: the 1.5×IQR fence (skew-robust) and the |z| ≥ 2 or 3 thresholds (assume approximate normality).

Degrees of Freedom (df) The number of independent values that go into a statistic. For the sample mean it is n − 1, because once you know n − 1 values and the mean, the final value is determined.

🔬

Research data audit

Validating a measured sample

Dataset 10.2, 10.5, 10.4, 10.6, 10.3, 10.5, 10.4, 10.7, 10.5, 10.4 Mode Sample, 95% CI

Mean ≈ 10.45, sample SD ≈ 0.142, 95% CI ≈ [10.348, 10.552]. CV is around 1.4% — very low relative dispersion, typical of well-controlled measurements. Skewness near 0 and no IQR outliers signal a tight, symmetric sample. A researcher would conclude the measurement instrument is consistent.

📈

Stock-return diagnostic

Daily returns over 20 trading days

Dataset 2.1, -1.4, 0.8, 3.2, -0.5, 1.7, -2.3, 0.9, 1.2, -0.7, 2.8, -1.1, 0.4, 1.5, -0.3, 2.0, -1.8, 0.6, 1.3, -0.9 Mode Sample, 95% CI

Mean return ≈ 0.42%, SD ≈ 1.56% — CV is roughly 370% (huge relative to the mean). The 95% CI for the mean spans both negative and positive territory, so the return is not statistically distinguishable from zero. Skewness is mild, and no values exceed |z| = 3 — extreme tail risk looks bounded in this window. This is the kind of analysis a quant uses to decide whether a daily-return mean is worth acting on or just noise.

📏

Height outlier detection

20 adult heights with one suspect value

Dataset 162, 170, 165, 178, 155, 182, 169, 175, 160, 171, 163, 180, 172, 158, 176, 168, 174, 161, 167, 250 Mode Sample, 95% CI

The 250 cm value carries |z| > 3 and lies far beyond Q3 + 1.5×IQR, so both outlier rules flag it. Removing it tightens the SD significantly and pulls the mean down by several centimeters. In the real world, a value this far above the upper fence is almost always a transcription error — the analyst would correct or exclude it and document the change.

Descriptive statistics summarize the shape, center, and spread of a dataset before any inferential testing begins. The mean and median tell you where the data sits, but the variance, skewness, kurtosis, confidence interval, and outlier diagnostics are what let you decide whether the summary is trustworthy and which subsequent analyses are appropriate. The sections below cover the four decisions every analyst makes when reaching for descriptive stats: choosing between sample and population formulas, interpreting confidence intervals correctly, reading skewness and kurtosis as distribution diagnostics, and picking the right outlier rule for your data.

When to use sample vs. population formulas

The single most common mistake in introductory statistics is mixing up the sample and population formulas for variance and standard deviation. The sample formula divides by n − 1 (Bessel's correction); the population formula divides by n. Use the population version when your data covers the entire group you care about — every employee on a payroll, every transaction in a fiscal year, every student on a roster. Use the sample version whenever your data is a subset and you want to draw conclusions about a larger group.

Why n − 1? Because the sample mean is itself an estimate, the sum of squared deviations from the sample mean is slightly smaller than the sum of squared deviations from the unknown true mean. Dividing by n − 1 compensates for this bias. The correction is barely noticeable when n is large (n = 100 versus n = 99 changes the answer by 1%), but with small samples it matters: a sample of 5 values gives 25% higher variance under the sample formula than the population formula. When unsure, default to sample — it is the conservative choice when your data was collected from a process or population.

How confidence intervals actually work

A 95% confidence interval for the mean is a range that — over many repeated samples drawn from the same population — would contain the true mean about 95% of the time. It is computed as x̄ ± t × (s / √n), where t comes from the Student t-distribution at the chosen confidence level and df = n − 1. For large samples (n ≥ 30), the t-value is essentially the z-value: 1.645 for 90%, 1.960 for 95%, 2.576 for 99%. For small samples, t is larger to compensate for the extra uncertainty in estimating the population SD from a small sample.

A common misinterpretation is to say the true mean has a 95% chance of being inside any one CI you compute. That is not what the procedure guarantees — once you have computed a specific interval, the true mean is either in it or not. What the procedure guarantees is that the long-run rate of intervals containing the true mean is 95%. For practical purposes the two interpretations lead to similar decisions, but Bayesian credible intervals (which do support the 'probability the mean is in this range' interpretation) require a different framework.

Raising the confidence level widens the interval. 99% CIs are roughly 30% wider than 95% CIs at the same sample size — the price of more confidence is less precision. The other lever is sample size: doubling n shrinks the CI by a factor of √2 ≈ 1.41.

Skewness and kurtosis as distribution diagnostics

Skewness measures asymmetry. A skewness near zero indicates the distribution is roughly symmetric around the mean — a good sign that the mean and median agree. Positive skewness signals a long right tail (the mean is pulled above the median by extreme high values); negative skewness signals a long left tail. As a rule of thumb, |skew| < 0.5 is symmetric enough that the mean is a reasonable measure of center; |skew| > 1 suggests the median is a better summary.

Excess kurtosis measures tail heaviness compared to a normal distribution. Positive excess kurtosis (leptokurtic) means the distribution produces extreme values more often than a bell curve — common in financial returns, network latency, and reaction times. Negative excess kurtosis (platykurtic) means the distribution has lighter tails than normal — uniform distributions are an extreme example. Excess kurtosis near zero is consistent with normality but does not prove it; for a formal check, look at a Q-Q plot or run a Shapiro-Wilk test.

Both statistics are sensitive to outliers, especially with small samples. A single extreme value can flip a near-symmetric distribution into one with skewness > 1. Always inspect the histogram alongside these summaries — the numbers are diagnostic, not definitive.

Which outlier rule to use

The 1.5×IQR rule (Tukey's fences) flags any value below Q1 − 1.5×IQR or above Q3 + 1.5×IQR. It is the right default for two reasons. First, it uses quartiles, which are unaffected by extreme values — so the rule does not let outliers mask other outliers. Second, it works on skewed data because the fences adjust to the data's shape rather than assuming symmetry.

The z-score rule (|z| ≥ 2 or 3) is more familiar but has two limitations. It assumes the data is approximately normal: under that assumption, only about 5% of values should have |z| ≥ 2 and 0.3% should have |z| ≥ 3, so anything above those thresholds is suspicious. On heavily skewed data, however, the rule misfires — half the values in a log-normal distribution can be 'outliers' by the z-score definition. The second limitation is more subtle: a few extreme outliers inflate the mean and SD, making moderate outliers harder to detect (called the masking problem).

This calculator reports both rules so you can see when they agree and when they diverge. When they agree, you have a strong case to investigate or remove the flagged values. When they disagree — typically the IQR rule flags more values on skewed data — prefer the IQR rule and consider whether your data needs a log transform before further analysis. Whatever you do, always document outlier handling: removing values is a research decision that affects every downstream statistic.

How do I calculate standard deviation?+

Compute the mean. For each value, subtract the mean and square the result. Sum those squared differences. For sample SD, divide by (n − 1) and take the square root. For population SD, divide by n and take the square root. This calculator does both automatically once you choose Sample or Population in the dropdown.

What is the difference between mean and median?+

The mean is the arithmetic average — sum of values divided by count. The median is the middle value when data is sorted. The median is robust against outliers and skewed distributions, while the mean uses every value and is therefore sensitive to extremes. Income and house-price data are typically reported as medians for this reason.

What is a confidence interval for the mean?+

A range that — over many repeated samples from the same population — would contain the true population mean a given percentage of the time. A 95% CI means that if you repeated your sampling procedure, 95% of the resulting intervals would capture the true mean. Wider CIs (e.g. 99%) provide more confidence at the cost of precision.

How do I identify outliers?+

Two common rules. The 1.5×IQR rule flags any value below Q1 − 1.5×IQR or above Q3 + 1.5×IQR — preferred for skewed data. The z-score rule flags values where |z| ≥ 2 (moderate) or |z| ≥ 3 (extreme) — preferred for approximately normal data. This calculator reports both side-by-side in the Outlier Analysis tab so you can compare.

When should I use sample vs. population formulas?+

Use sample formulas (dividing by n − 1) when your data is a sample drawn from a larger population and you want to estimate that population's properties. Use population formulas (dividing by n) when your data represents the entire population. The sample versions slightly inflate variance to correct for sampling bias — known as Bessel's correction.

What do skewness and kurtosis tell me?+

Skewness measures asymmetry. Positive skew means a long right tail (mean > median); negative skew means a long left tail; |skew| < 0.5 is roughly symmetric. Excess kurtosis measures tail heaviness compared to a normal distribution — 0 is normal-like, positive (leptokurtic) means heavier tails with more outliers, negative (platykurtic) means lighter tails.

Dataset

Full Descriptive Stats Sample

Frequency Distribution

Frequency Table

Outlier Analysis

Box & Whisker Plot

Per-Value Z-Scores

How to Use This Calculator

Paste your data

Choose Sample or Population and a CI level

Read the stats and explore outliers

Formula & Methodology

Key Terms Explained

Real-World Examples

Research data audit

Stock-return diagnostic

Height outlier detection

Beyond Mean and Median: A Practitioner's Guide to Descriptive Statistics

When to use sample vs. population formulas

How confidence intervals actually work

Skewness and kurtosis as distribution diagnostics

Which outlier rule to use

Frequently Asked Questions

Descriptive Statistics Calculator

Dataset

Results

Full Descriptive Stats Sample

Frequency Distribution

Frequency Table

Outlier Analysis

Box & Whisker Plot

Per-Value Z-Scores

How to Use This Calculator

Paste your data

Choose Sample or Population and a CI level

Read the stats and explore outliers

Formula & Methodology

Key Terms Explained

Real-World Examples

Beyond Mean and Median: A Practitioner's Guide to Descriptive Statistics

When to use sample vs. population formulas

How confidence intervals actually work

Skewness and kurtosis as distribution diagnostics

Which outlier rule to use

Frequently Asked Questions

Keep Exploring

Related calculators

Guides & articles