Teacher resources and professional development across the curriculum

Teacher professional development and classroom resources across the curriculum

Session 7, Part D:
Fitting Lines to Data

In This Part: Trend Lines | Error | The SSE | More Lines | Summary

 You should have decided in Problem D3 that two of the three lines are better candidates for describing the trend in the data points. The line Height = Arm Span has nine points that are above the line, three that are on the line, and 12 that are below the line. The line Height = Arm Span - 1 has 12 points that are above the line, four that are on the line, and eight that are below the line. So which of these lines is "better" at describing the relationship? While personal judgement is useful, statisticians prefer to use more objective methods. To develop criteria for identifying the "better" line, we'll use a concept developed in Part C: the vertical distance from a point to a line.

Person 11, whose arm span is 173 cm and whose height is 185 cm, is represented by the point (173, 185) in the scatter plot. If you were to use the line to predict person 11's height based on his or her arm span, the predicted values would be represented by the point (173, 173), which lies on the line Height = Arm Span. The scatter plot thus far looks like this:

 The difference between the actual observed height (Y) and the corresponding hypothetical, predicted height (on the line) is called the error. If we use YL (Y on the line) to designate the Y coordinate that represents the predicted height, then we can calculate the error as follows: Error = Y - YL In other words, Error = Actual Observed Height - Predicted Height (on the line). Finally, the vertical distance between an observed height and a predicted height can be expressed as: Distance = |Y - YL| = |Error|

Let's see how this works for the line Height = Arm Span (i.e., YL = X).

The following table shows the arm span (X), the actual observed height (Y), the predicted height based on the line Height = Arm Span (i.e., YL = X), the error, and the vertical distance between the person's observed height (Y) and predicted height (YL) for Persons 1 through 6 in our study:

Person #

Arm Span (X)

Height (Y)

YL=X

Error = Y - YL

Distance
=
|Y - YL|

 1 156 162 156 6 6 2 157 160 157 3 3 3 159 162 159 3 3 4 160 155 160 -5 5 5 161 160 161 -1 1 6 161 162 161 1 1

Problem D4

Complete this table for the remaining 18 people. When you click "Show Answers," the filled-in table will appear below the problem. Scroll down the page to see it.

Person #

Arm Span (X)

Height (Y)

YL = X

Error = Y - YL

Distance
=
|Y - YL|

 7 162 170 8 165 166 9 170 170 10 170 167 11 173 185 12 173 176 13 177 173 14 177 176 15 178 178 16 184 180 17 188 188 18 188 187 19 188 182 20 188 181 21 188 192 22 194 193 23 196 184 24 200 186

Person #

Arm Span (X)

Height (Y)

YL=X

Error = Y - YL

Distance
=
|Y - YL|

 7 162 170 162 8 8 8 165 166 165 1 1 9 170 170 170 0 0 10 170 167 170 -3 3 11 173 185 173 12 12 12 173 176 173 3 3 13 177 173 177 -4 4 14 177 176 177 -1 1 15 178 178 178 0 0 16 184 180 184 -4 4 17 188 188 188 0 0 18 188 187 188 -1 1 19 188 182 188 -6 6 20 188 181 188 -7 7 21 188 192 188 4 4 22 194 193 194 -1 1 23 196 184 196 -12 12 24 200 186 200 -14 14

 • A point above the line is indicated by a positive value of (Y - YL); this is called a positive error. • A point below the line is indicated by a negative value of (Y - YL); this is called a negative error. • A point is on the line when (Y - YL) equals 0, and there is no error. • The vertical distance from a point to the line YL = X is the absolute value of the error. The smaller this distance is, the closer the actual data point is, vertically, to the line.

One measure of how well a particular line describes the trend in bivariate data is the total of the vertical distances. When comparing two lines, the line with the smaller total of the vertical distances is the "better" line in terms of how well it describes the linear relationship between the two variables. For the line Height = Arm Span (i.e., YL = X), this is the sum of the sixth column in the above tables combined, which is 100.

But perhaps people aren't really "square." Might a better prediction be that height is one centimeter shorter than arm span? Let's see how well the line Height = Arm Span - 1 (i.e., YL = X - 1) describes the trend.

The following table shows the arm span (X), the actual observed height (Y), the predicted height based on the line YL = X - 1, the error, and the vertical distance between the person's observed height (Y) and predicted height (YL) for Persons 1 through 6 in our study:

Person #

Arm Span (X)

Height (Y)

YL=X-1

Error = Y-YL

Distance = |Y-YL|

 1 156 162 155 7 7 2 157 160 156 4 4 3 159 162 158 4 4 4 160 155 159 -4 4 5 161 160 160 0 0 6 161 162 160 2 2

Problem D5

Complete the table for the remaining 18 people. Then compute the total vertical distance for the line Height = Arm Span - 1, and compare the result to the total vertical distance for the line Height = Arm Span. Based on your calculations, which line provides the better fit? When you click "Show Answers," the filled-in table will appear below the problem. Scroll down the page to see it.

Person #

Arm Span (X)

Height (Y)

YL=X-1

Error = Y-YL

Distance = |Y-YL|

 7 162 170 8 165 166 9 170 170 10 170 167 11 173 185 12 173 176 13 177 173 14 177 176 15 178 178 16 184 180 17 188 188 18 188 187 19 188 182 20 188 181 21 188 192 22 194 193 23 196 184 24 200 186

Person #

Arm Span (X)

Height (Y)

YL=X-1

Error = Y-YL

Distance = |Y-YL|

 7 162 170 161 9 9 8 165 166 164 2 2 9 170 170 169 1 1 10 170 167 169 -2 2 11 173 185 172 13 13 12 173 176 172 4 4 13 177 173 176 -3 3 14 177 176 176 0 0 15 178 178 177 1 1 16 184 180 183 -3 3 17 188 188 187 1 1 18 188 187 187 0 0 19 188 182 187 -5 5 20 188 181 187 -6 6 21 188 192 187 5 5 22 194 193 193 0 0 23 196 184 195 -11 11 24 200 186 199 -13 13

For the model YL = X - 1, the total vertical distance is 7 + 4 + ... + 13 = 100. Surprisingly, according to this measure of fit, the two lines are equally good. This suggests that another measure of best fit may be useful.