7.6.3 Regression lines
In experimental work, one often wants to determine relationships which are not exactly determined. For
example, you might want to find whether a vaccine is effective. In terms of linear equations, this amounts
to there being no solution to a system of equations but still needing to answer a question about the
Example 7.6.11 The least squares regression line is the line y = mx + b which approximates data
which typically come from some sort of experiment. It is desired to choose m,b in
such a way that the sum of the squares of the errors between the value predicted by the line and the
observed values is as small as possible. In other words, you want to minimize ∑
Ideally, the sum would be zero and this would correspond to the data points being on a straight line.
This will never occur in any realistic situation in which the data points come from experiments.
Suppose you are given points in xy plane
and you would like to find constants m and b such that the line y = mx + b goes through all these points.
Of course this will be impossible in general. Therefore, try to find m,b to get as close as possible. The
desired system is
which is of the form y = Ax and it is desired to choose m and b to make
as small as possible. According to Theorem 7.6.8, the best values for m and b occur as the solution
Thus, after computing ATA,ATy
Solving this system of equations for m and b,
One could clearly do a least squares fit for curves of the form y = ax2 + bx + c in the same way. In this
case you want to solve as well as possible for a,b, and c the system
and one would use the same technique as above. Many other similar problems are important, including
many in higher dimensions and they are all solved the same way.
Example 7.6.12 Find the least squares regression line for the data
You would ideally want to solve the following system of equations
Of course there is no solution so you look for a least squares solution. You have ATA equals
and ATb is
and so you need to solve
The solution is:
Thus the least squares line is
If you graph these data points and the line, you will see how the line tries to do the impossible by picking a
route through the data points which minimizes the error which results.