Teacher resources and professional development across the curriculum

Teacher professional development and classroom resources across the curriculum

Session 7, Part D:
Fitting Lines to Data

In This Part: Trend Lines | Error | The SSE | More Lines | Summary

Can we do better? Recall that for the 24 people in this study, the mean arm span is 175.5 cm and the mean height is 174.8 cm. Note that the mean arm span is .7 cm longer than the mean height. This suggests that we might try the line Height = Arm Span - .7 to describe the trend in our bivariate data. Let's see how this line compares with the previous models.

Here is the scatter plot of the data and a graph of the line YL = X - .7:

Here is the table for the line YL = X - .7:

Person #

Arm Span (X)

Height (Y)

YL = X - .7

Error = Y - YL

(Error)2
=
(Y - YL)2

 1 156 162 155.3 6.7 44.89 2 157 160 156.3 3.7 13.69 3 159 162 158.3 3.7 13.69 4 160 155 159.3 -4.3 18.49 5 161 160 160.3 -0.3 0.09 6 161 162 160.3 1.7 2.89 7 162 170 161.3 8.7 75.69 8 165 166 164.3 1.7 2.89 9 170 170 169.3 0.7 0.49 10 170 167 169.3 -2.3 5.29 11 173 185 172.3 12.7 161.29 12 173 176 172.3 3.7 13.69 13 177 173 176.3 -3.3 10.89 14 177 176 176.3 -0.3 0.09 15 178 178 177.3 0.7 0.49 16 184 180 183.3 -3.3 10.89 17 188 188 187.3 0.7 0.49 18 188 187 187.3 -0.3 0.09 19 188 182 187.3 -5.3 28.09 20 188 181 187.3 -6.3 39.69 21 188 192 187.3 4.7 22.09 22 194 193 193.3 -0.3 0.09 23 196 184 195.3 -11.3 127.69 24 200 186 199.3 -13.3 176.89

For this line, the sum of squared errors is 770.56, which makes it a slightly better model than the line YL = X - 1 (whose SSE was 772).

Problem D8

Here are the three lines we've considered, plus two new ones:

 YL = X SSE = 784 YL = X + 1 SSE = 844 YL = X - 1 SSE = 772 YL = X - 2 SSE = 808 YL = X - .7 SSE = 770.56
 a. Judging on the basis of the SSE, which is the best line? Which is the worst? b. What other ways could we change the line equation in an attempt to further reduce the SSE? c. Is it possible to reduce the SSE to 0? Why or why not?

We have examined several lines that have yielded different SSEs. The lines, however, had one thing in common: they all had a slope of 1, so they were all parallel. Keep in mind that the slope of a line is often described as the ratio of rise to run. The formula for slope is: slope = (change in Y) / (change in X). Now, let's investigate a line with a different slope to describe the trend in the data.

One such line, with slope 0.75, passes through (164, 164) and (188, 182) and near many of the other data points; its equation is YL = 0.75X + 41. Let's compare this line to line YL = X - .7, which is the best fit we have found so far.

Note that these two lines are not parallel since they have different slopes.

Here is the scatter plot of the 24 people and the graph of the lines YL = .75X + 41 and YL = X - .7:

Here is the table to find the SSE for the line YL = .75X + 41:

Person #

Arm Span (X)

Height (Y)

YL = .75X + 41

Error = Y - YL

(Error)2
=
(Y - YL)2

 1 156 162 1558 4 16 2 157 160 158.75 1.25 1.5625 3 159 162 160.25 1.75 3.0625 4 160 155 161 -6 36 5 161 160 161.75 -1.75 3.0625 6 161 162 161.75 0.25 0.0625 7 162 170 162.5 7.5 56.25 8 165 166 164.75 1.25 1.5625 9 170 170 168.5 1.5 2.25 10 170 167 168.5 1.5 2.25 11 173 185 170.75 14.25 203.0625 12 173 176 170.75 5.25 27.5625 13 177 173 173.75 -0.75 0.5625 14 177 176 173.75 2.25 5.0625 15 178 178 174.5 3.5 12.25 16 184 180 179 1 1 17 188 188 182 6 36 18 188 187 182 5 25 19 188 182 182 0 0 20 188 181 182 -1 1 21 188 192 182 10 100 22 194 193 186.5 6.5 42.25 23 196 184 188 -4 16 24 200 186 191 -5 25

The SSE for the line YL = .75X + 41 is 616.8 (as compared to 770.56). So this new line, with its different slope, turns out to be a better fit for the data set. Note 3

 Session 7: Index | Notes | Solutions | Video