Linear regression is the workhorse statistical model for two-variable relationships — used in fields from finance to medicine to sports analytics. This calculator fits an ordinary least-squares line to your data, reports every quantity teachers, analysts, and researchers actually need, and lets you predict new values with proper uncertainty intervals.
How the linear regression calculator works
Given paired X and Y values, the calculator finds the slope m and intercept b that minimise the sum of squared vertical residuals Σ(y − ŷ)². This ordinary least squares approach has a clean closed-form solution: m = Sxy / Sxx and b = ȳ − m·x̄, where Sxx = Σ(x − x̄)² and Sxy = Σ(x − x̄)(y − ȳ).
From there, the calculator derives the Pearson correlation r, the coefficient of determination R² = r², the residual standard error SE = √(SSE / (n − 2)), and the two-tailed p-value for the slope using a Student t test with n − 2 degrees of freedom. Confidence and prediction intervals use the user-selected level (90, 95, or 99%).
Reading the residuals
The fitted line is only useful if its assumptions hold. Linear regression assumes (1) the relationship is approximately linear, (2) residuals have roughly constant variance, and (3) residuals are independent. The Residual Analysis tab plots residuals against predicted Y — a healthy fit looks like a structureless cloud around zero.
Patterns to watch for: a curved sweep suggests a non-linear relationship (try a polynomial or log transform); a fan that widens with predicted Y indicates heteroscedasticity (consider weighted least squares); a single point with a standardized residual beyond ±3 is a likely outlier or data-entry error worth investigating.
Predicting new values — and knowing the limits
The Prediction tab returns the model's best estimate for Y at a new X, along with two intervals. The confidence interval for the mean brackets the expected average Y at that X — useful when you care about the population mean. The prediction interval for an individual observation is wider because it includes the typical scatter of single data points around the line.
Both intervals widen as X moves away from x̄, and they become unreliable for predictions far outside the observed X range — never extrapolate beyond your data. A common pitfall: a strong R² does not prove causation. The temperature/ice-cream example shows two variables can move together because a third factor (summer) drives both.