The least squares line (regression line) is the straight line that best fits a set of bivariate data by minimising the sum of squared residuals (vertical distances from each point to the line).
Where:
- \(\hat{y}\) = predicted value of the response variable
- \(x\) = value of the explanatory variable
- \(b\) = slope (gradient)
- \(a\) = y-intercept
Where:
- \(r\) = correlation coefficient
- \(s_y\) = standard deviation of \(y\)
- \(s_x\) = standard deviation of \(x\)
- \(\bar{x}, \bar{y}\) = means of \(x\) and \(y\)
In practice, use a CAS calculator (LinReg) to obtain \(a\) and \(b\) directly.
“For each one unit increase in [x variable], the predicted [y variable] increases/decreases by \(b\) [units].”
Example: If \(b = 8.3\) and \(x\) = hours studied, \(y\) = exam score:
“For each additional hour of study, the predicted exam score increases by 8.3 marks.”
If \(b < 0\):
“For each additional [unit of x], the predicted [y] decreases by \(|b|\) [units].”
“When [x variable] = 0, the predicted [y variable] is \(a\) [units].”
Note: The intercept may not always be meaningful in context. If \(x = 0\) is outside the range of the data, interpret with caution.
Example: If \(a = 32.4\):
“When a student studies 0 hours, the predicted exam score is 32.4 marks.”
(This may or may not be sensible depending on context.)
Data: hours studied (\(x\)) and exam score (\(y\))
CAS output: \(a = 31.5\), \(b = 9.2\), \(r = 0.93\)
Equation: \(\hat{y} = 31.5 + 9.2x\)
Slope interpretation: For each additional hour studied, the predicted exam score increases by 9.2 marks.
Intercept interpretation: A student who studied 0 hours is predicted to score 31.5 marks.
Prediction: If \(x = 4\) hours: \(\hat{y} = 31.5 + 9.2(4) = 31.5 + 36.8 = 68.3\) marks.
The least squares line always passes through the point of means \((\bar{x}, \bar{y})\). This is a useful check.
KEY TAKEAWAY: The least squares line minimises the sum of squared residuals. Interpret the slope in context (change in y per unit change in x) and the intercept as the predicted y when x = 0.
EXAM TIP: Always write the equation with the actual variable names, not just \(x\) and \(y\). E.g., \(\widehat{\text{score}} = 31.5 + 9.2 \times \text{hours}\).
COMMON MISTAKE: Confusing slope and intercept interpretations. The slope is the rate of change; the intercept is the starting value when x = 0.