Mean vs. Median: Which Center Matters?
The mean and median both describe the "center" of a dataset, but they do so differently. The mean incorporates every data point with equal weight, making it mathematically elegant and useful for symmetric, normally distributed data. The median simply finds the middle value after sorting, making it impervious to extreme values.
This distinction has enormous real-world consequences. U.S. household income, for example, is always reported as the median — not the mean — because the distribution is heavily right-skewed. A small number of very high earners pull the mean far above the income of a typical household. In 2023, the median household income was around $74,000 while the mean was over $100,000. For policy purposes, the median is far more informative.
House Prices and Real Estate
The same pattern appears in real estate. A single luxury sale in a neighborhood can inflate the mean sale price dramatically, making the market look more expensive than it is for most buyers. Real estate professionals almost universally quote median home prices for this reason. The mode — the most common sale price — is less used in this context but can be useful for identifying which price point has the most market activity.
When Mode Matters Most
Mode is essential for categorical data — "what color do customers choose most?" or "what shoe size should we stock most of?" — where a numerical average is meaningless. For numerical data, mode becomes interesting when a distribution is multimodal. A bimodal dataset (two clear peaks) often signals two distinct subgroups: for example, a survey of ages at a university event might show peaks at 20 (students) and 50 (faculty). Spotting multimodality in a histogram is often more informative than any single summary statistic.
The Outlier Problem
Outliers — values far from the bulk of the data — distort the mean but not the median. They also inflate the range (max − min) but barely affect the IQR. When outliers are genuine data points (not errors), the right response is to report both mean and median, note the outliers explicitly, and consider whether the outliers represent a separate phenomenon worth investigating on its own.