Enter your paired data

Paste or type your X values (the independent variable) and Y values (the dependent variable) as comma- or whitespace-separated numbers. The two lists must contain the same number of points — each X is paired with the Y in the matching position.

02

Pick a confidence level

Choose 90, 95, or 99% for the confidence interval on the slope and the prediction at a new X. 95% is the standard default for most reporting; 99% widens the intervals for higher-stakes decisions.

03

Read the regression equation

The hero shows y = mx + b. The slope m tells you how Y changes for each unit increase in X; the intercept b is the predicted Y when X is zero. R² shows the fraction of Y's variance that the line explains.

04

Check the residuals and predict

Open the Residual Analysis tab to verify the fit — residuals should look like a random cloud around zero. Use the Prediction tab to read the predicted Y at your chosen X along with confidence and prediction intervals.

Slope and intercept (ordinary least squares)

m = [n·Σxy − Σx·Σy] / [n·Σx² − (Σx)²]; b = ȳ − m·x̄

Ordinary least squares chooses the slope m and intercept b that minimize the sum of squared vertical residuals Σ(y − ŷ)². x̄ and ȳ are the sample means of X and Y.

Coefficient of determination

R² = r² = [SST − SSE] / SST

R² is the proportion of variance in Y that the linear model explains. SST = Σ(y − ȳ)² is the total variance in Y; SSE = Σ(y − ŷ)² is the residual variance. R² ranges from 0 to 1; for simple regression it equals the square of Pearson r.

Standard error of the regression

SE = √[Σ(y − ŷ)² / (n − 2)]

Also called the residual standard error, SE is the typical size of a residual. It divides by n − 2 because two parameters (m and b) were estimated from the data. SE drives the confidence interval for the slope and the prediction intervals.

T-statistic and p-value for the slope

t = m / SE(m), where SE(m) = SE / √Σ(x − x̄)²

Tests the null hypothesis that the true slope is zero. The two-tailed p-value comes from the Student t distribution with n − 2 degrees of freedom. A p-value below 0.05 is the conventional threshold for declaring a statistically significant linear relationship.

Slope (m) The average change in Y for a one-unit increase in X. A slope of 2.5 means Y is expected to rise by 2.5 units when X increases by 1. Sign indicates direction: positive slope = uphill, negative = downhill.

Y-intercept (b) The predicted Y when X equals zero. The intercept only has a real-world interpretation when X = 0 is inside (or near) the observed range; otherwise it is purely a mathematical anchor for the line.

Pearson correlation coefficient (r) A unitless measure of linear association ranging from −1 to +1. The sign matches the slope; the magnitude measures fit quality. r = 0.8 is a strong positive linear relationship; r = 0 indicates no linear pattern.

Coefficient of determination (R²) The fraction of Y's variance that the regression line explains. R² = 0.64 means 64% of the variation in Y is accounted for by its linear relationship with X; the remaining 36% is residual scatter or unmeasured factors.

Residual The vertical gap between an observed Y and the regression line's prediction ŷ. Residuals capture what the model failed to explain; their pattern reveals whether a linear fit is appropriate.

Standardized residual A residual divided by the residual standard error. Values beyond ±2 are unusual; beyond ±3 are highly suspicious and warrant investigation as potential outliers or data-entry errors.

Standard error of the regression The typical magnitude of a residual, computed as √(SSE / (n − 2)). It sets the scale for confidence intervals on the slope and prediction intervals around new Y values.

P-value (for the slope) The probability of observing a slope as far from zero as the calculated m if the true relationship were flat. P < 0.05 is the conventional threshold for declaring a statistically significant linear relationship.

Confidence interval vs. prediction interval A confidence interval is a range for the average Y at a given X; a prediction interval is a range for a single new observation at that X. Prediction intervals are always wider because they include both the model uncertainty and the spread of individual points.

📚

Study Hours vs. Exam Grade

Education research

Hours studied (X) 2, 3, 4, 5, 6, 7, 8, 9, 10 Exam grade (Y) 63, 68, 72, 75, 79, 82, 87, 91, 95 Predict at X = 11

Slope ≈ 4.0 means each extra hour of studying is associated with a four-point grade increase. R² ≈ 0.99 — study time explains nearly all of the grade variation in this small sample. A student who studies 11 hours is predicted to score about 99.

📏

Height vs. Weight

Anthropometrics

Height in inches (X) 60, 62, 64, 66, 68, 70, 72, 74, 76 Weight in pounds (Y) 105, 118, 132, 148, 163, 178, 192, 207, 222 Predict at X = 70

Slope ≈ 7.3 lb per inch. A 70-inch (5'10") adult in this group is predicted to weigh around 177 pounds. R² ≈ 1.00 because this is an idealised sample; real-world height-vs-weight scatter is much wider once age, build, and sex are accounted for.

🌡️

Temperature vs. Ice-Cream Sales

Business analytics

Outdoor temperature °F (X) 55, 60, 65, 70, 75, 80, 85, 90, 95 Daily sales $ (Y) 120, 145, 180, 220, 260, 300, 345, 390, 420 Predict at X = 88

Slope ≈ $7.7 per degree Fahrenheit, with R² ≈ 0.998. At 88°F the model predicts around $375 in daily sales. Note that this is correlation, not causation: hotter weather may drive sales through tourism, day-of-week, or vacation timing rather than temperature itself.

Linear regression is the workhorse statistical model for two-variable relationships — used in fields from finance to medicine to sports analytics. This calculator fits an ordinary least-squares line to your data, reports every quantity teachers, analysts, and researchers actually need, and lets you predict new values with proper uncertainty intervals.

How the linear regression calculator works

Given paired X and Y values, the calculator finds the slope m and intercept b that minimise the sum of squared vertical residuals Σ(y − ŷ)². This ordinary least squares approach has a clean closed-form solution: m = Sxy / Sxx and b = ȳ − m·x̄, where Sxx = Σ(x − x̄)² and Sxy = Σ(x − x̄)(y − ȳ).

From there, the calculator derives the Pearson correlation r, the coefficient of determination R² = r², the residual standard error SE = √(SSE / (n − 2)), and the two-tailed p-value for the slope using a Student t test with n − 2 degrees of freedom. Confidence and prediction intervals use the user-selected level (90, 95, or 99%).

Reading the residuals

The fitted line is only useful if its assumptions hold. Linear regression assumes (1) the relationship is approximately linear, (2) residuals have roughly constant variance, and (3) residuals are independent. The Residual Analysis tab plots residuals against predicted Y — a healthy fit looks like a structureless cloud around zero.

Patterns to watch for: a curved sweep suggests a non-linear relationship (try a polynomial or log transform); a fan that widens with predicted Y indicates heteroscedasticity (consider weighted least squares); a single point with a standardized residual beyond ±3 is a likely outlier or data-entry error worth investigating.

Predicting new values — and knowing the limits

The Prediction tab returns the model's best estimate for Y at a new X, along with two intervals. The confidence interval for the mean brackets the expected average Y at that X — useful when you care about the population mean. The prediction interval for an individual observation is wider because it includes the typical scatter of single data points around the line.

Both intervals widen as X moves away from x̄, and they become unreliable for predictions far outside the observed X range — never extrapolate beyond your data. A common pitfall: a strong R² does not prove causation. The temperature/ice-cream example shows two variables can move together because a third factor (summer) drives both.

How do you calculate linear regression by hand?+

Compute the means x̄ and ȳ, then the sums Sxx = Σ(x − x̄)² and Sxy = Σ(x − x̄)(y − ȳ). The slope is m = Sxy / Sxx and the intercept is b = ȳ − m·x̄. The Pearson correlation r equals Sxy / √(Sxx · Syy), where Syy = Σ(y − ȳ)². R² is just r². For small datasets this is fast on paper; this calculator does it for any n in microseconds and adds the standard error, p-value, and intervals you'd otherwise need a stats package for.

What is R² and what counts as a 'good' value?+

R² is the fraction of variance in Y that the line explains, between 0 (line explains nothing) and 1 (line explains everything). What counts as good depends on the field: physics experiments routinely see R² > 0.99; social-science studies often celebrate R² ≈ 0.3 because human behaviour is noisy. Look at R² together with the p-value, the residual plot, and your domain knowledge — never in isolation. A low R² doesn't mean there's no relationship; it means a straight line isn't capturing most of the signal.

How do I interpret the regression equation y = mx + b?+

The slope m is the average change in Y for each one-unit increase in X — units are 'Y-units per X-unit'. A positive m means Y rises with X; negative means Y falls. The intercept b is the predicted Y when X is zero. The intercept is only physically meaningful when X = 0 lies inside or near your observed range; otherwise treat it as a mathematical anchor that pins the line in place. Always restate the equation in plain English (e.g. "each extra study hour adds about 4 points to the grade") so non-technical readers can use it.

Does correlation imply causation?+

No. A strong linear regression only shows that X and Y move together — it cannot tell you whether X causes Y, Y causes X, or some third factor is driving both. The classic temperature-vs-ice-cream example: a high R² between temperature and shark attacks doesn't mean cold-blooded sharks pay attention to the thermometer; both rise in summer because more people visit the beach. To make a causal claim you need a controlled experiment, an instrumental variable, or strong domain reasoning — never a regression alone.

Can I share the regression result?+

Yes. Click Share to copy a URL that encodes your X values, Y values, prediction target, and confidence level into the page hash. Anyone who opens the link will land on the same fitted regression. Use Copy to put a plain-text summary on your clipboard, or CSV to download the full residual table for use in a spreadsheet.

Linear Regression Calculator

Data Input

Residual table

Residual scatter

Predict Y from the fitted line

How to Use This Calculator

Enter your paired data

Pick a confidence level

Read the regression equation

Check the residuals and predict

Formula & Methodology

Key Terms Explained

Real-World Examples

Study Hours vs. Exam Grade

Height vs. Weight

Temperature vs. Ice-Cream Sales

Understanding Linear Regression

How the linear regression calculator works

Reading the residuals

Predicting new values — and knowing the limits

Frequently Asked Questions

Linear Regression Calculator

Data Input

Residual table

Residual scatter

Predict Y from the fitted line

How to Use This Calculator

Enter your paired data

Pick a confidence level

Read the regression equation

Check the residuals and predict

Formula & Methodology

Key Terms Explained

Real-World Examples

Understanding Linear Regression

How the linear regression calculator works

Reading the residuals

Predicting new values — and knowing the limits

Frequently Asked Questions

Keep Exploring

Related calculators

Guides & articles