Teacher resources and professional development across the curriculum

Teacher professional development and classroom resources across the curriculum

Monthly Update sign up
Mailing List signup
Search
Follow The Annenberg Learner on LinkedIn Follow The Annenberg Learner on Facebook Follow Annenberg Learner on Twitter
MENU
Learning Math Home
Data Session 7, Part C: Fitting Lines to Data
 
Session 7 Part A Part B Part C Part D Homework
 
Glossary
Data Site Map
Session 7 Materials:
Notes
Solutions
Video

Session 7, Part D:
Fitting Lines to Data

In This Part: Trend Lines | Error | The SSE | More Lines | Summary

You should have decided in Problem D3 that two of the three lines are better candidates for describing the trend in the data points. The line Height = Arm Span has nine points that are above the line, three that are on the line, and 12 that are below the line. The line Height = Arm Span - 1 has 12 points that are above the line, four that are on the line, and eight that are below the line.

So which of these lines is "better" at describing the relationship? While personal judgement is useful, statisticians prefer to use more objective methods. To develop criteria for identifying the "better" line, we'll use a concept developed in Part C: the vertical distance from a point to a line.

 

Person 11, whose arm span is 173 cm and whose height is 185 cm, is represented by the point (173, 185) in the scatter plot. If you were to use the line to predict person 11's height based on his or her arm span, the predicted values would be represented by the point (173, 173), which lies on the line Height = Arm Span. The scatter plot thus far looks like this:

The difference between the actual observed height (Y) and the corresponding hypothetical, predicted height (on the line) is called the error. If we use YL (Y on the line) to designate the Y coordinate that represents the predicted height, then we can calculate the error as follows:

Error = Y - YL

In other words, Error = Actual Observed Height - Predicted Height (on the line).

Finally, the vertical distance between an observed height and a predicted height can be expressed as:

Distance = |Y - YL| = |Error|

Let's see how this works for the line Height = Arm Span (i.e., YL = X).

The following table shows the arm span (X), the actual observed height (Y), the predicted height based on the line Height = Arm Span (i.e., YL = X), the error, and the vertical distance between the person's observed height (Y) and predicted height (YL) for Persons 1 through 6 in our study:

Person #

Arm Span (X)

Height (Y)

YL=X

Error = Y - YL

Distance
=
|Y - YL|

1

156

162

156

6

6

2

157

160

157

3

3

3

159

162

159

3

3

4

160

155

160

-5

5

5

161

160

161

-1

1

6

161

162

161

1

1

Problem D4

show answers  

Complete this table for the remaining 18 people. When you click "Show Answers," the filled-in table will appear below the problem. Scroll down the page to see it.

Person #

Arm Span (X)

Height (Y)

YL = X

Error = Y - YL

Distance
=
|Y - YL|

7

162

170

8

165

166

9

170

170

10

170

167

11

173

185

12

173

176

13

177

173

14

177

176

15

178

178

16

184

180

17

188

188

18

188

187

19

188

182

20

188

181

21

188

192

22

194

193

23

196

184

24

200

186

Person #

Arm Span (X)

Height (Y)

YL=X

Error = Y - YL

Distance
=
|Y - YL|

7

162

170

162

8

8

8

165

166

165

1

1

9

170

170

170

0

0

10

170

167

170

-3

3

11

173

185

173

12

12

12

173

176

173

3

3

13

177

173

177

-4

4

14

177

176

177

-1

1

15

178

178

178

0

0

16

184

180

184

-4

4

17

188

188

188

0

0

18

188

187

188

-1

1

19

188

182

188

-6

6

20

188

181

188

-7

7

21

188

192

188

4

4

22

194

193

194

-1

1

23

196

184

196

-12

12

24

200

186

200

-14

14

hide answers


 
 

Here are some observations about this table:

 

A point above the line is indicated by a positive value of (Y - YL); this is called a positive error.

 

A point below the line is indicated by a negative value of (Y - YL); this is called a negative error.

 

A point is on the line when (Y - YL) equals 0, and there is no error.

 

The vertical distance from a point to the line YL = X is the absolute value of the error. The smaller this distance is, the closer the actual data point is, vertically, to the line.

One measure of how well a particular line describes the trend in bivariate data is the total of the vertical distances. When comparing two lines, the line with the smaller total of the vertical distances is the "better" line in terms of how well it describes the linear relationship between the two variables. For the line Height = Arm Span (i.e., YL = X), this is the sum of the sixth column in the above tables combined, which is 100.

But perhaps people aren't really "square." Might a better prediction be that height is one centimeter shorter than arm span? Let's see how well the line Height = Arm Span - 1 (i.e., YL = X - 1) describes the trend.

The following table shows the arm span (X), the actual observed height (Y), the predicted height based on the line YL = X - 1, the error, and the vertical distance between the person's observed height (Y) and predicted height (YL) for Persons 1 through 6 in our study:

Person #

Arm Span (X)

Height (Y)

YL=X-1

Error = Y-YL

Distance = |Y-YL|

1

156

162

155

7

7

2

157

160

156

4

4

3

159

162

158

4

4

4

160

155

159

-4

4

5

161

160

160

0

0

6

161

162

160

2

2


 

Problem D5

show answers  

Complete the table for the remaining 18 people. Then compute the total vertical distance for the line Height = Arm Span - 1, and compare the result to the total vertical distance for the line Height = Arm Span. Based on your calculations, which line provides the better fit? When you click "Show Answers," the filled-in table will appear below the problem. Scroll down the page to see it.

Person #

Arm Span (X)

Height (Y)

YL=X-1

Error = Y-YL

Distance = |Y-YL|

7

162

170

8

165

166

9

170

170

10

170

167

11

173

185

12

173

176

13

177

173

14

177

176

15

178

178

16

184

180

17

188

188

18

188

187

19

188

182

20

188

181

21

188

192

22

194

193

23

196

184

24

200

186

 

Person #

Arm Span (X)

Height (Y)

YL=X-1

Error = Y-YL

Distance = |Y-YL|

7

162

170

161

9

9

8

165

166

164

2

2

9

170

170

169

1

1

10

170

167

169

-2

2

11

173

185

172

13

13

12

173

176

172

4

4

13

177

173

176

-3

3

14

177

176

176

0

0

15

178

178

177

1

1

16

184

180

183

-3

3

17

188

188

187

1

1

18

188

187

187

0

0

19

188

182

187

-5

5

20

188

181

187

-6

6

21

188

192

187

5

5

22

194

193

193

0

0

23

196

184

195

-11

11

24

200

186

199

-13

13

For the model YL = X - 1, the total vertical distance is 7 + 4 + ... + 13 = 100. Surprisingly, according to this measure of fit, the two lines are equally good. This suggests that another measure of best fit may be useful.

hide answers


Next > Part D (Continued): The SSE

Learning Math Home | Data Home | Register | | Glossary | Map | ©

Session 7: Index | Notes | Solutions | Video

© Annenberg Foundation 2014. All rights reserved. Legal Policy