Session 7, Part D:
Fitting Lines to Data

In This Part: Trend Lines | Error | The SSE | More Lines | Summary

Another way to see how close an individual's data point is to a line is to square the error. This is similar to how you calculated the variance in Session 5, where you squared the distances from the mean. Like the absolute value, each squared error produces a positive number. Again, for each individual point, the smaller the squared error, the closer the actual data point is to the line. Here are the squared errors for Persons 1 through 12:

Person #

Arm Span (X)

Height (Y)

YL = X

Error = Y - YL

(Error)2
=
(Y - YL)2

 1 156 162 156 6 36 2 157 160 157 3 9 3 159 162 159 3 9 4 160 155 160 -5 25 5 161 160 161 -1 1 6 161 162 161 -1 1 7 162 170 162 8 64 8 165 166 165 1 1 9 170 170 170 0 0 10 170 167 170 -3 9 11 173 185 173 12 144 12 173 176 173 3 9

Problem D6

Complete the table to find the squared error for the remaining 12 people.

Person #

Arm Span (X)

Height (Y)

YL = X

Error = Y - YL

(Error)2
=
(Y - YL)2

 13 177 173 14 177 176 15 178 178 16 184 180 17 188 188 18 188 187 19 188 182 20 188 181 21 188 192 22 194 193 23 196 184 24 200 186

Person #

Arm Span (X)

Height (Y)

YL = X

Error = Y - YL

(Error)2
=
(Y - YL)2

 13 177 173 177 -4 16 14 177 176 177 -1 1 15 178 178 178 0 0 16 184 180 184 -4 16 17 188 188 188 0 0 18 188 187 188 -1 1 19 188 182 188 -6 36 20 188 181 188 -7 49 21 188 192 188 4 16 22 194 193 194 -1 1 23 196 184 196 -12 144 24 200 186 200 -14 196

Another measure of how well a particular line describes the relationship in bivariate data is the total of the squared errors. When comparing two lines, the line with the smaller total of the squared errors is the "better" line in terms of how well it describes the linear relationship between the two variables. For the line Height = Arm Span, this is the sum of the sixth column in the above table, which is 784.

This quantity, the sum of squared errors (SSE), is what statisticians prefer to use when comparing different lines for potential fit. If you could consider all possible lines, then the one with the smallest SSE is called the least squares line; it may also be referred to as the line of best fit.

Before we determine the SSE for the line Height = Arm Span - 1 (i.e., YL = X - 1), let's take a look at Person 1 and the line YL = X - 1:

Person #

Arm Span (X)

Height (Y)

YL = X - 1

Error
=
Y - YL

(Error)2
=
(Y - YL)2

 1 156 162 155 7 49

Person 1's squared error can be represented on the graph as a square with a side whose length is |Y - YL|:

The following is the scatter plot for the data and a graph of the line YL = X - 1.

Note once again that a point above the line is indicated by a positive error; a point below the line is indicated by a negative error; and a point is on the line when the error is 0.

The following table shows the arm span (X), the observed height (Y), the predicted height based on the line Height = Arm Span - 1 (i.e., YL = X - 1), the error, and the vertical distance between the person's observed height (Y) and predicted height (YL) for Persons 1 through 6 in our study:

Person #

Arm Span (X)

Height (Y)

YL=X-1

Error = Y - YL

(Error)2
=
(Y - YL)2

 1 156 162 155 7 49 2 157 160 156 4 16 3 159 162 158 4 16 4 160 155 159 -4 16 5 161 160 160 0 0 6 161 162 160 2 4

Problem D7

Complete table below for the remaining 18 people. Then compute the sum of the squared errors for the line Height = Arm Span - 1, and compare the result to the sum of squared errors for the line Height = Arm Span. Based on your calculations, which line provides the better fit?

Person #

Arm Span (X)

Height (Y)

YL = X - 1

Error = Y - YL

(Error)2
=
(Y - YL)2

 7 162 170 8 165 166 9 170 170 10 170 167 11 173 185 12 173 176 13 177 173 14 177 176 15 178 178 16 184 180 17 188 188 18 188 187 19 188 182 20 188 181 21 188 192 22 194 193 23 196 184 24 200 186

Person #

Arm Span (X)

Height (Y)

YL = X - 1

Error = Y - YL

(Error)
=
(Y - YL)2

 7 162 170 161 9 81 8 165 166 164 2 4 9 170 170 169 1 1 10 170 167 169 -2 4 11 173 185 172 13 169 12 173 176 172 4 16 13 177 173 176 -3 9 14 177 176 176 0 0 15 178 178 177 1 1 16 184 180 183 -3 9 17 188 188 187 1 1 18 188 187 187 0 0 19 188 182 187 -5 25 20 188 181 187 -6 36 21 188 192 187 5 25 22 194 193 193 0 0 23 196 184 195 -11 121 24 200 186 199 -13 169

The sum of squared errors (SSE) is 49 + 16 + ... + 169 = 772. Since this is less than the sum of squared errors for the line Height = Arm Span (which was 784), the line Height = Arm Span - 1 is a slightly better fit.