Teacher resources and professional development across the curriculum

Teacher professional development and classroom resources across the curriculum

Monthly Update sign up
Mailing List signup
Search
MENU
Learning Math Home
Data Session 7, Part D: Fitting Lines to Data
 
Session 7 Part A Part B Part C Part D Homework
 
Glossary
Data Site Map
Session 7 Materials:
Notes
Solutions
Video

Session 7, Part D:
Fitting Lines to Data

In This Part: Trend Lines | Error | The SSE | More Lines | Summary

Can we do better? Recall that for the 24 people in this study, the mean arm span is 175.5 cm and the mean height is 174.8 cm. Note that the mean arm span is .7 cm longer than the mean height. This suggests that we might try the line Height = Arm Span - .7 to describe the trend in our bivariate data. Let's see how this line compares with the previous models.

Here is the scatter plot of the data and a graph of the line YL = X - .7:

Here is the table for the line YL = X - .7:

Person #

Arm Span (X)

Height (Y)

YL = X - .7

Error = Y - YL

(Error)2
=
(Y - YL)2

1

156

162

155.3

6.7

44.89

2

157

160

156.3

3.7

13.69

3

159

162

158.3

3.7

13.69

4

160

155

159.3

-4.3

18.49

5

161

160

160.3

-0.3

0.09

6

161

162

160.3

1.7

2.89

7

162

170

161.3

8.7

75.69

8

165

166

164.3

1.7

2.89

9

170

170

169.3

0.7

0.49

10

170

167

169.3

-2.3

5.29

11

173

185

172.3

12.7

161.29

12

173

176

172.3

3.7

13.69

13

177

173

176.3

-3.3

10.89

14

177

176

176.3

-0.3

0.09

15

178

178

177.3

0.7

0.49

16

184

180

183.3

-3.3

10.89

17

188

188

187.3

0.7

0.49

18

188

187

187.3

-0.3

0.09

19

188

182

187.3

-5.3

28.09

20

188

181

187.3

-6.3

39.69

21

188

192

187.3

4.7

22.09

22

194

193

193.3

-0.3

0.09

23

196

184

195.3

-11.3

127.69

24

200

186

199.3

-13.30

176.89

For this line, the sum of squared errors is 770.56, which makes it a slightly better model than the line YL = X - 1 (whose SSE was 772).

Problem D8

Solution  

Here are the three lines we've considered, plus two new ones:

YL = X

SSE = 784

YL = X + 1

SSE = 844

YL = X - 1

SSE = 772

YL = X - 2

SSE = 808

YL = X - .7

SSE = 770.56

a. 

Judging on the basis of the SSE, which is the best line? Which is the worst?

b. 

What other ways could we change the line equation in an attempt to further reduce the SSE?

c. 

Is it possible to reduce the SSE to 0? Why or why not?


 
 

We have examined several lines that have yielded different SSEs. The lines, however, had one thing in common: they all had a slope of 1, so they were all parallel. Keep in mind that the slope of a line is often described as the ratio of rise to run. The formula for slope is: slope = (change in Y) / (change in X). Now, let's investigate a line with a different slope to describe the trend in the data.

One such line, with slope 0.75, passes through (164, 164) and (188, 182) and near many of the other data points; its equation is YL = 0.75X + 41. Let's compare this line to line YL = X - .7, which is the best fit we have found so far.

Note that these two lines are not parallel since they have different slopes.

Here is the scatter plot of the 24 people and the graph of the lines YL = .75X + 41 and YL = X - .7:

Here is the table to find the SSE for the line YL = .75X + 41:

Person #

Arm Span (X)

Height (Y)

YL = .75X + 41

Error = Y - YL

(Error)2
=
(Y - YL)2

1

156

162

1558

4

16

2

157

160

158.75

1.25

1.5625

3

159

162

160.25

1.75

3.0625

4

160

155

161

-6

36

5

161

160

161.75

-1.75

3.0625

6

161

162

161.75

0.25

0.0625

7

162

170

162.5

7.5

56.25

8

165

166

164.75

1.25

1.5625

9

170

170

168.5

1.5

2.25

10

170

167

168.5

1.5

2.25

11

173

185

170.75

14.25

203.0625

12

173

176

170.75

5.25

27.5625

13

177

173

173.75

-0.75

0.5625

14

177

176

173.75

2.25

5.0625

15

178

178

174.5

3.5

12.25

16

184

180

179

1

1

17

188

188

182

6

36

18

188

187

182

5

25

19

188

182

182

0

0

20

188

181

182

-1

1

21

188

192

182

10

100

22

194

193

186.5

6.5

42.25

23

196

184

188

-4

16

24

200

186

191

-5

25

The SSE for the line YL = .75X + 41 is 616.8 (as compared to 770.56). So this new line, with its different slope, turns out to be a better fit for the data set. Note 3


Next > Part D (Continued): Summary

Learning Math Home | Data Home | Register | | Glossary | Map | ©

Session 7: Index | Notes | Solutions | Video

Home | Catalog | About Us | Search | Contact Us | Site Map

  • Follow The Annenberg Learner on Facebook

© Annenberg Foundation 2013. All rights reserved. Privacy Policy