Skip to main content Skip to main content

Private: Learning Math: Data Analysis, Statistics, and Probability

Min, Max and the Five-Number Summary Part D: The Box Plot (25 minutes)

In This Part: Five-Number Summary with Measurement Data

Now we’ll look at how you can represent the Five-Number Summary graphically, using a box plot. For this activity, we will work with a set of 12 noodles with the following measurements (in millimeters):

 

 


Problem D1
Why is it necessary to order the data before creating a Five-Number Summary?


Let’s create a Five-Number Summary for this set of ordered data:

 

 

 

 

 

 

 

Determine Q1, Med, and Q3:
The lines representing Q1, Med, and Q3 each have lengths that are halfway between their adjacent noodles:
Q1 = (33 + 41) / 2 = 37
Med = (74 + 81) / 2 = 77.5
Q3 = (102 + 109) / 2 = 105.5

 

 

 

 

 

 

 

Here is the Five-Noodle Summary:

 

 

 

 

 

 

 

Add a vertical number line:

 

 

 

 

 

 

 

Here is the Five-Number Summary:

 

 

 

 

 

 

 


In This Part: Drawing a Box Plot
Once we have the Five-Number Summary, we can display it using a kind of graph known as a box plot. Here is the box plot for the noodle data we’ve been using:

 

 

 

 

 

 

The box plot is also called a box-and-whiskers plot. Though it looks very different from previous graphs, it’s just another way to represent the distribution of the data we’ve been working with all along:

  • The lower whisker extends from Min to Q1. The length of this whisker indicates the range of the lowest (or, in this case, the shortest) fourth of the ordered data.
  • The upper whisker extends from Q3 to Max. The length of this whisker indicates the range of the highest (or, in this case, the longest) fourth of the ordered data.
  • The box (the rectangular portion of the graph) extends from Q1 to Q3, with a horizontal line segment indicating Med.
  • The portion of the rectangle between Q1 and Med indicates the range of the second fourth of the ordered data.
  • The portion of the rectangle between Med and Q3 indicates the range of the third fourth of the ordered data.
  • The entire rectangle indicates the range of the middle half (the interquartile range) of the ordered data.

Note that the box plots can be drawn vertically or horizontally, depending on whether you display the Five-Number Summary along a vertical or a horizontal axis. See Note 2 below.

Video Segment
In this video segment, Professor Kader introduces the process of building a box plot. Watch this segment to review the process or to help you draw the box plots for the following problem.

Note: The data set used by the onscreen participants is different from the one provided above.

 


Let’s compare our noodle data as represented by the Five-Noodle Summary, the Five-Number Summary, and the box plot.

Review the sequence of illustrations on the previous page and on this page, to follow the progression from noodles through box plot.


Problem D2
Using the same scale for each plot, create a box plot for each of the data sets below, which we first saw in Session 2. Each is an ordered list of the number of raisins in a group of boxes from a particular brand. You may want to save your data for use in Session 6.

 

 

Start by listing the position for each value in the data set. For example, in the set of Brand A raisins, the value 23 is in the first position, 25 is in the second position, the second 25 is in the third position, and so forth.


Problem D3
Compare the two box plots from Problem D2 side by side. What conclusions can you draw about Brand A raisins in comparison to Brand B raisins, using only the box plots?

Video Segment
In this video segment, Professor Kader and participants use the box plot to compare different brands of raisins. They then discuss the usefulness of the box plot as a summary of data. Watch this segment after completing Problem D3.
Note: The data sets used by the onscreen participants is different from the ones provided above.
Is the box plot more useful for making comparisons between different distributions than a line plot? Why or why not?

FATHOM Dynamic StatisticsTM Software used with permission of Key Curriculum Press.

Notes

Note 2
The Five-Number Summary uses intervals to describe the variation in different segments of your data. The longer the interval, the greater the variation. Some people will misinterpret a box plot. For example, given a box plot with the Q3-Max whisker considerably longer than the Min-Q1 whisker, one could think, “Wow, there are a lot more data in the highest interval than there are in the lowest interval.” We’re used to associating length with “how many” rather than “how far apart,” and we forget that the same number of values falls within each of these intervals.

It is also important to note the difference between a histogram and a box plot, another potential source of confusion. To construct a histogram, you prescribe intervals of uniform length and then count how many data values fall within each interval. To determine the five numbers for the box plot, you do the reverse: prescribe how many data values you want in each interval and then determine the intervals.

Fathom Software, used by the onscreen participants, is helpful in creating graphical representations of data. You can use Fathom Software to complete Problems D2-D3. For more information, go to the Key Curriculum Press Web Site at
http://www.keypress.com/fathom/.

Solutions

Problem D1
Since the median and quartiles require separating the data into halves that are larger or smaller than a central value, it is necessary to order the data. If the data are unordered, it is much more difficult to find the value that splits the list into two equal groups.

Problem D2
To create a box plot, first create a Five-Number Summary for each data set:

a. For Brand A, here is the Five-Number Summary:
Min = 23
Q1 = 27
Med = 29.5
Q3 = 32
Max = 39

Here is the box plot:

 

 

 

 

 

b. For Brand B, here is the Five-Number Summary:
Min = 17
Q1 = 25
Med = 26
Q3 = 29
Max = 30

Here is the box plot:

 

 

 

 

 

Problem D3
Placing the box plots side by side clearly shows that a large number of Brand A boxes have more raisins than Brand B boxes. The interquartile range is a little wider for Brand A, and the top 25% of Brand A boxes are all higher than Brand B’s maximum. This suggests strongly that Brand A, on average, has more raisins in a typical box than Brand B.

 

Series Directory

Private: Learning Math: Data Analysis, Statistics, and Probability

Credits

Produced by WGBH Educational Foundation. 2001.
  • Closed Captioning
  • ISBN: 1-57680-481-X

Sessions