Glossary
A
 Addition Rule
 If C and D are mutually exclusive events, then P(C or D) = P(C) + P(D). Unit 19
 Adequacy of a Linear Model
 A line is adequate to describe the pattern in a set of data points provided the data have linear form. A residual plot is a good way of checking adequacy. Unit 11
 Alternative Hypothesis or H_{a}
 The claim in a significance test that we are trying to gather evidence for  the researcher's point of view. The alternative hypothesis is contradictory to H_{0} and is judged the more plausible claim when H_{0} is rejected. Unit 25
 ANOVA
 Analysis of variance (ANOVA) is a technique used to analyze variation in data in order to test whether three or more population means are equal. Unit 31
 Assumptions of the Linear Regression Model
 The observed response y for any value of x varies according to a normal distribution. Repeated responses, yvalues, are independent of each other.
 The mean response, μ_{y}, has a straightline relationship with x: μ_{y} = α + βx.
 The standard deviation of y, σ, is the same for all values of x.
B
 Bar Chart
 Graph of a frequency distribution for categorical data. Each category is represented by a bar whose area is proportional to the frequency, relative frequency, or percent of that category. If the categorical variable is ordinal, the logical order of the categories should be preserved in the bar chart. Unit 2
 BetweenGroups Variation
 A measure of the spread of the group means about the grand mean, the mean of all the observations. It is measured by the mean square for groups, MSG. Unit 31
 Biased Sample
 A sample in which some individuals or groups from the population are less likely to be selected than others due to some attribute. Unit 17
 Binomial Distribution
 In a binomial setting with n trials and probability of success p, the distribution of x = the number of successes. Shorthand notation for this distribution is b(n, p). The probabilities p(x) for the binomial distribution with parameters n and p can be calculated using the following formula:
\begin{align} p(x) = & {n \choose x} p^x (1p)^{nx} \text{ for $x$ = 0, 1, ... $n$,} \\ \text{where } & {n \choose x} = \frac{n!}{x!(nx)!} \end{align}
Unit 21  Binomial Random Variable
 The number of successes, x, in a binomial setting with n trials with probability of success p. The mean and standard deviation of a binomial random variable x can be calculated as follows:
\begin{align} \mu & = np \\ \sigma & = \sqrt{np(1p)} \end{align}
Unit 21  Binomial Setting
 A setting in which there are a fixed number of n independent trials. Each trial can result in only one of two outcomes, success or failure, and the probability of success, p, is the same for each trial. Unit 21
 Bivariate Data
 Measurements or observations are recorded on two attributes for each individual or subject under study. Unit 10
 Boxplot (or BoxandWhisker Plot)
 Graphical representation of the fivenumber summary. The basic boxplot consists of a box that extends from the first quartile to the third quartile with whiskers that extend from each box end to the minimum and maximum data values. The basic boxplot can be modified to include identification of mild and extreme outliers. Unit 5
C
 Categorical Variable
 Variable whose values are classifications or categories. Gender, occupation, and eye color are examples of categorical variables. Unit 13
 Census
 An attempt to gather information about every individual in a population. Unit 16
 Center Line
 The center line on a control chart is generally the target value or mean of the quality characteristic being sampled. Unit 23
 Central Limit Theorem
 If the sample size n is large (say n > 30), then the sampling distribution of the sample mean x̄ of n independent observations from the same population has an approximate normal distribution. If the population mean and variance are μ and σ, respectively, then x̄ has an approximate normal distribution with mean μ and standard deviation σ/√n. Unit 22
 ChiSquare Test Statistic for Independence
 The chisquare test for independence is used for categorical variables. For testing the null hypothesis H_{0}: no association between the variables or H_{0}: variables are independent, the chisquaretest statistic is computed as follows:
\[ \chi^2 = \frac{(\text{observed}  \text{expected})^2}{\text{expected}} \]
If the null hypothesis is true, \( \chi^2 \) will have a chisquare distribution with degrees of freedom (r  1)(c  1), where r and c are the number of rows and columns in the twoway table, respectively. Unit 29  Common Cause Variation
 Variation due to daytoday factors that influence the process. Unit 23
 Complement of an Event A
 An event that consists of all the outcomes in the sample space that are not in A. If B is the complement of A, then B = not A. Unit 19
 Complement Rule
 For any event C, P(not C) = 1 – P(C). Unit 19
 Complementary Events
 Two events are complementary if they are mutually exclusive and combining their outcomes into a single set gives the entire sample space. Unit 19
 Conditional Distribution
 There are two sets of conditional distributions for a twoway table:
 distributions of the row variable for each fixed level of the column variable
 distributions of the column variable for each fixed level of the row variable
 Confidence Interval
 An interval estimate computed from sample data that gives a range of plausible values for a population parameter. The interval is constructed so that the value of the parameter will be captured between the endpoints of the interval with a chosen level of confidence. Unit 24
 Confidence Interval for μ (tinterval)
 When σ is unknown, the sample size n is small, and the population distribution is approximately normal, a tconfidence interval for μ is given by the following formula:
\[\bar{x} \pm t^* \biggl( \frac{s}{\sqrt{n}} \biggr) \]
where t* is a tcritical value associated with the confidence level and determined from a tdistribution with df = n  1 degrees of freedom. Unit 26  Confidence Interval for μ (zinterval)
 When σ is known and either the sample size n is large or the population distribution is normal, a confidence interval for μ is given by the following formula:
\[\bar{x} \pm z^* \biggl( \frac{\sigma}{\sqrt{n}} \biggr) \]
where z* is a zcritical value (from a standard normal distribution) associated with the confidence level. Unit 24  Confidence Interval for p
 In situations where the sample size n is large, a confidence interval for the population proportion p is given by the following formula:
\[\hat{p}\pm z^*\sqrt{\frac{\hat{p}(1\hat{p})}{n}}\]
where p̂ is the sample proportion and z^{*} is the a zcritical value (from a standard normal distribution) associated with the confidence level. Unit 28  Confidence Interval for Population Slope β
 A confidence interval for the population slope β is given by the following formula:
\[ b \pm t^* s_b \]
where t^{*} is a tcritical value associated with the confidence level and determined from a tdistribution with df = n  2; b is the leastsquares estimate of the population slope calculated from the data, and s_{b} is the standard error of b. Unit 30  Confidence Level
 A number that provides information on how much confidence we have in the method used to construct a confidence interval estimate of a population parameter. It is the longrun success rate (success means capturing the parameter in the interval) of the method used to construct the confidence interval. Unit 24
 Confounding Factors
 Two (or more) factors (explanatory variables) are confounded when their effects on a response variable are intertwined and cannot be distinguished from each other. Unit 15
 Continuous Random Variable
 A random variable that can take on values that include an interval. The number of possible distinct outcomes is uncountable; there are too many possible values to put them all in a list. Unit 20
 Control Charts
 Charts used to monitor the output of a process. The charts are designed to signal when the process has been disturbed so that it is now out of control or is about to go out of control. Unit 23
 Control Group
 A group in an experiment that does not receive the treatment under study. The control group could receive a placebo to hide the fact that no treatment is being given. In an active control group, the subjects receive what might be considered the existing standard treatment. Unit 15
 Control Limits
 The upper control limit (UCL) and lower control limit (LCL) on a control chart are generally set ±3 σ/√n from the center line. Unit 23
 Convenience Sampling
 A sampling design in which the pollster selects a sample that is easy to obtain, such as friends, family, coworkers, and so forth. Unit 17
 Correlation
 Denoted by r, correlation measures the direction and strength of a linear relationship between two quantitative variables. The formula for computing Pearson’s correlation coefficient is:
\[ r = \frac{1}{1n} \sum \biggl( \frac{x\bar{x}}{s_x} \biggr) \biggl( \frac{y\bar{y}}{s_y} \biggr) \]
Unit 12
D
 Decision Rules
 A set of rules that identify from a control chart when a process is becoming unstable or going out of control. Unit 23
 Degrees of Freedom for Test for Independence
 (r  1)(c  1), where the numbers r and c are the number of rows and columns in the twoway table, respectively. Unit 29
 Dependent Events
 Two events are dependent if the fact that one of the events occurs does affect the probability that the other occurs. Events that are not dependent are independent. Unit 19
 Dependent Variable
 A variable whose outcome we would like to predict based on another variable (independent variable). The dependent variable is always plotted on the vertical axis of a scatterplot. Also called a response variable. Unit 10
 Deviations from the Mean
 The deviations of each data value from the sample mean: x_{1}  x̄, x_{2}  x̄, ... x_{n}  x̄. Unit 6
 Discrete Random Variable
 A random variable that can take on only a countable number of distinct values – in other words, it is possible to list all possible values. Any random variable that can take on only a finite number of values is a discrete random variable. Unit 20
 Distribution
 Description of the possible values a variable assumes and how often these values occur. Unit 2
 Dotplot
 Graphical display of quantitative data in which each observation (or a group of a specified number of observations) is represented by a dot above a horizontal axis. Unit 2
 DoubleBlind Experiment
 An experiment in which neither the subjects nor the individuals measuring the response know which subjects are assigned to which treatment. Unit 15
E
 Empirical Rule (689599.7% Rule)
 Rule that gives the approximate percentage of data that fall within one standard deviation (68%), two standard deviations (95%), and three standard deviations (99.7%) of the mean. This rule should be applied only when the data are approximately normal. Unit 8
 Estimated Regression Line
 The estimated regression line for the linear regression model is the leastsquares line, ŷ = a + bx. Unit 30
 Expected Counts
 The number of observations that would be expected to fall into each cell (or class) of a twoway table if the null hypothesis is true. The expected counts for the chisquare test for independence are computed as follows:
\[ \text{expected count} = \frac{(\text{row total})(\text{column total})}{\text{grand total}} \]
Unit 29  Experimental Study
 A study in which researchers deliberatively apply some treatment to the subjects in order to observe their responses. The purpose is to study whether the treatment causes a change in the response. Unit 15
 Explanatory Variable
 Variable that is used to predict the response variable. The explanatory variable is always plotted on the horizontal axis of a scatterplot. Also called Independent Variable. Unit 10
F
 FTest Statistic
 The test statistic of the ratio of the MSG and MSE, \( F = \frac{MSG}{MSE} \) , which is used for testing H_{0}: μ_{1} = μ_{2} = ... = μ_{k}. When H_{0} is true, F has an F distribution with numerator df = k  1 and denominator df = N  k, where k is the number of groups and N is the total number of observations. Unit 31
 Factors
 The explanatory variables in an observational study or an experiment. Also called the independent variables. Unit 15, Unit 31
 First Quartile or Q1
 The onequarter point in an ordered set of quantitative data. To compute Q1, calculate the median of the lower half of the ordered data. Unit 5
 FiveNumber Summary
 A five number summary of a quantitative data set consists of the following: minimum, first quartile (Q1), median, third quartile (Q3), maximum. Unit 5
 Frequency Distribution
 A table that displays frequencies of data falling into categories or class intervals. Unit 3
H
 Histogram
 Graphical representation of a frequency distribution. Bars are drawn over each class interval on a number line. The areas of the bars are proportional to the frequencies with which data fall into the class intervals. Unit 3
I
 In Control
 The state of a process that is running smoothly, with its variables staying within an acceptable range. Unit 23
 Independent Events
 Two events are independent if the fact that one of the events occurs does not affect the probability that the other occurs. Unit 19
 Independent Variable
 Variable that is used to predict the dependent variable. The independent variable is always plotted on the horizontal axis of a scatterplot. Also called Explanatory Variable. Unit 10
 Interquartile range or IQR
 A measure of the spread of the middle half of the data: IQR = Q3 – Q1. The IQR is a resistant measure of the variability of a data set. Unit 5
J
 Joint Distribution of Two Categorical Variables
 A twoway table of counts gives the joint distribution of two categorical variables. The joint distribution can be converted to percentages by dividing each cell count by the grand total and then multiplying by 100%. Unit 13
L
 LeastSquares Regression
 A method for finding the bestfitting curve to a given set of data points by minimizing the sum of the squares of the residual errors (SSE). Unit 11
 LeastSquares Regression Line
 The leastsquares line is the line that makes the sum of the squares of the residual errors (SSE) as small as possible. The equation of the leastsquares line has the form y = a + bx, where a and slope b can be calculated from n data pairs (x, y) using the following formulas:
\begin{align} b & = \frac{\sum (x\bar{x})(y\bar{y})}{\sum (x\bar{x})^2} \\ a & = \bar{y}b\bar{x} \end{align}
Unit 11  Level
 One of the possible values or settings that a factor can assume. Unit 31
 Linear Form
 A scatterplot has linear form when dots in a scatter plot appear to be randomly scattered on either side of a straight line. Unit 10
 Linear Regression Model
 The simple linear regression model assumes that for each value of x the observed values of the response variable y are normally distributed about a mean μ_{y} that has the following linear relationship with x:
\[ \mu_y = \alpha + \beta x \]
Unit 30  Lurking Variable
 An extraneous variable that is related to the other variables in a study. A lurking variable that is linked to both an explanatory variable and a response variable can be the underlying cause for an observed relationship between the explanatory and response variable. Unit 14
M
 Margin of Error
 For confidence intervals of the form point estimate ± margin of error, the margin of error gives the range of values above and below the point estimate. The margin of error is the halfwidth of the confidence interval. Unit 24
 Marginal Distribution
 A distribution computed from a twoway table of counts by dividing the row or column totals by the overall total. Often the marginal distributions are expressed as percentages. Unit 13
 Marginal Totals
 The sum of the row entries or the sum of the column entries in a twoway table of counts. Unit 13
 MatchedPairs tTest Statistic
 In testing H_{0}: μ_{D} = μ_{D0} where μ_{D} is the population mean difference, given by
\[t = \frac{\bar{x}_D  \mu_{D_{0}}}{s_D/\sqrt{n}}\]
where x̄_{D} and s_{D} are the mean and standard deviation of the sample differences. If the differences are approximately normally distributed and the null hypothesis is true, then t has a tdistribution with df = n  1 degrees of freedom. Unit 26  Mean
 The arithmetic average or balance point of sample data. To calculate the mean, sum the data values and divide the sum by the number of data values.
If the sample consists of observations x_{1},x_{2},...,x_{n}, then the sample mean is
\[ \bar{x} = \frac{\sum{x}}{n} \]
Unit 4  Mean of a Discrete Random Variable x
 Given a probability distribution, p(x), the mean is calculated as follows:
\[ \mu = \sum x \cdot p(x) \]
Unit 20  Median
 A resistant measure of center of a data set. The median separates the upper half of the data from the lower half. To calculate the median, order the data from smallest to largest and count up (n + 1)/2 places in the ordered list. Unit 4
 Mode
 The data value in a quantitative data set that occurs most frequently. Unit 4
 Multiplication Rule
 If C and D are independent, then P(C and D) = P(C)P(D). Unit 19
 Multistage Sampling
 A sampling design that begins by dividing the population into clusters. In stage one, the pollster choses a (random) sample of clusters. In subsequent stages, samples are chosen from each of the selected clusters. Unit 17
 Multivariate Data
 Data that consists of measurements or observations recorded on two or more attributes for each individual or subject under study. Unit 10
 Mutually Exclusive Events
 Events that have no outcomes in common. Events that are disjoint. Unit 19
N
 Negative Association
 Two variables have negative association if aboveaverage values of one accompany belowaverage values of the other, and vice versa. In a scatterplot, a negative association would appear as a pattern of dots in the upper left to the lower right. Unit 10
 Nonlinear Form
 Often scatterplots do not have linear form. Instead the data might form a curved pattern. In that case, we say the scatterplot has nonlinear form. Unit 10
 Normal Curve
 Bellshaped curve. The center line of the normal curve is at the mean μ. The changeofcurvature in the bellshaped curve occurs at μ – σ and μ + σ where σ is the standard deviation. Unit 7
 Normal Density Curve
 A normal curve scaled so that the area under the curve is 1. Unit 7
 Normal Distribution
 Distribution that is described by a normal density curve. Any particular normal distribution is completely specified by two numbers, its mean μ and standard deviation σ. Unit 7
 Normal Quantile Plot
 Also known as normal probability plot. A graphical method for assessing whether data come from a normal distribution. The plot compares the ordered data with what would be expected of perfectly normal data. A normal quantile plot that shows a roughly linear pattern suggests that it is reasonable to assume the data come from a normal distribution. Unit 9
 Null Hypothesis or H_{0}
 The claim tested by a significance test. Usually the null hypothesis is a statement about "no effect" or "no change." The null hypothesis has the following form: H_{0}: population parameter = hypothesized value. Unit 25
O
 Observational Study
 A study in which researchers observe subjects and measure variables of interest. However, the researchers do not try to influence the responses. The purpose is to describe groups of subjects under different situations. Unit 15
 Observed Counts
 The number of observations that fall into each cell (or class) of a twoway table. Unit 29
 OneSided Alternative Hypothesis
 The alternative hypothesis in a significance test is onesided if it states that either a parameter is greater than or a parameter is less than the null hypothesis value. Unit 25
 OneWay ANOVA
 An analysis of variance in which one factor is thought to be related to the response variable. Unit 31
 Out of Control
 The state of a process that is no longer in control. The process has become unstable or its variables are no longer within an acceptable range. Unit 23
 Outlier
 Data value that lies outside the overall pattern of the other data values. Unit 2
P
 Paired tConfidence Interval for μ_{D}
 When data are matched pairs, and the standard deviation of the population differences σ_{D} is unknown, a tconfidence interval estimate of the population mean differences, μ_{D}, is given by the formula:
\[\bar{x}_D \pm t^* \left( \frac{s_D}{\sqrt{n}} \right)\]
where t^{*} is a tcritical value associated with the confidence level and determined from a tdistribution with df = n  1 and x̄_{D} and s_{D} are the mean and standard deviation of the sample differences. Unit 26  Percentile
 A value such that a certain percentage of observations from the distribution falls at or below that value. The p^{th} percentile of a data set is a value such that p% of the observations fall at or below that value. Unit 9
 Pie Chart
 Graph of a frequency distribution for categorical data. Each category is represented by a slice of pie in which the area of the slice is proportional to the frequency or relative frequency of that category. Unit 2
 Placebo
 Something that is identical in appearance to the treatment received by the treatment group. Placebos are meant to be ineffectual and are given as control treatments. Unit 15
 Point Estimate
 A single number based on sample data (a statistic) that represents a plausible value for a population parameter. Unit 24
 Population
 The entire group of objects or individuals about which information is wanted. Unit 16
 Population Proportion
 For a population that is divided into two categories, success and failure, based on some characteristic, the population proportion, p, is:
\[p = \frac{\text{number of successes in the population}}{\text{population size}}\]
Unit 28  Population Regression Line
 The population regression line, μ_{y} = α + βx describes how the mean response y varies as x changes. Unit 30
 Positive Association
 Two variables have positive association if aboveaverage values of one tend to accompany aboveaverage values of the other and belowaverage values of one tend to accompany belowaverage values of the other. In a scatterplot, a positive association would appear as a pattern of dots in the lower left to the upper right. Unit 10
 Probability
 A measure of how likely it is that something will happen or something is true. Probabilities are always between 0 and 1. Events with probabilities closer to 0 are less likely to happen and events with probabilities closer to 1 are more likely to happen. Unit 18
 Probability Distribution
 A list of the possible values of a discrete random variable together with the probabilities associated with those values. Unit 20
 Process
 Chain of steps that turns inputs into outputs. Unit 23
 Prospective Study
 A study that starts with a group and watches for outcomes (for example, the development of cancer or remaining cancerfree) during the study period and relates this to suspected risk or protection factors that might be linked to the outcomes. Unit 14
 Pvalue
 The probability, computed under the assumption that the null hypothesis is true, of observing a value from the test statistic's distribution that is at least as extreme as the value of the test statistic that was actually observed. Unit 25
Q
 Quantitative Variable
 Variable whose values are numbers obtained from measurements or counts. Height, weight, and points scored at a basketball game are examples of quantitative variables. Unit 2
R
 Random Phenomenon
 A situation in which the possible outcomes are known but we do not know which one will occur. If the situation is repeated over and over, a regular pattern to the outcomes emerges over the long run. Unit 18
 Random Variable
 A variable whose possible values are numbers associated with outcomes of a random phenomenon. Unit 20
 Range
 Measure of the variability of a quantitative data set from its extremes: range = maximum – minimum. Unit 5
 Regression Line
 A straight line that describes how a response variable y is related to an explanatory variable x. Unit 11
 Representative Sample
 A sample that accurately reflects the members of the entire population. Unit 17
 Residual Error
 A residual error is the vertical deviation of a data point from the regression model: residual error = actual y – predicted y. Unit 11
 Resistant Measure
 A statistic that measures some aspect of a distribution (such as its center) that is relatively unaffected by a small subset of extreme data values. For example, the median is a resistant measure of the center of a distribution while the mean is not a resistant measure of center. Unit 4
 Response Variable
 The variable used to measure the outcome of a study, which we attempt to explain or predict using one or more independent variables (factors). The response variable is always plotted on the vertical axis of a scatterplot. Also called the dependent variable. Unit 10, Unit 31
 Retrospective Study
 A study that starts with an outcome (for example, two groups of people, a cancer group and a noncancer group) and then looks back to examine exposures to suspected risk or protection factors that might be linked to that outcome. Unit 14
 Run Chart
 A plot of data values versus the order in which these values were collected. Unit 23
S
 Sample
 The part of the population that is actually examined in a study. Unit 16
 Sample Mean
 One measure of center of a data set. The mean is the arithmetic average or balance point of a set of data. To calculate the mean, sum the data and divide by the number of data items:
\[ \bar{x} = \frac{\sum x}{n} \]
Unit 4  Sample Proportion
 The sample proportion, p̂, from a sample of size n is:
\[\hat{p} = \frac{\text{number of successes in the sample}}{n}\]
Unit 28  Sample Standard Deviation
 One measure of variability of a data set. The standard deviation has the same units as the data values. To calculate the standard deviation, take the square root of the sample variance:
\[ s = \sqrt{\frac{\sum{(x  \bar{x})^2}}{n1}} \]
Unit 6  Sample Variance
 One measure of variability of a data set. To calculate the variance, sum the squared deviations from the mean and divide by the number of data minus one:
\[ s^2 = \frac{\sum{(x  \bar{x})^2}}{n1} \]
Unit 6  Sampling Bias
 Occurs when a sample is collected in such a way that some individuals in the population are less likely to be included in the sample than others. Because of this, information gathered from the sample will be slanted toward those who are more likely to be part of the sample. Unit 16
 Sampling Design
 Plan of how to select the sample from the population. Unit 17
 Sampling Distribution
 The distribution of the values of a sample statistic (such as x̄, the median, or s) over many, many random samples chosen from the same population. Unit 22
 Sampling Distribution of the Sample Mean
 The distribution of x̄ over a very large number of samples. If x̄ is the mean of a simple random sample (SRS) of size n from a population having mean µ and standard deviation σ, then the mean and standard deviation of x̄ are:
\[ \mu_{\bar{x}} = \mu \\ \sigma_{\bar{x}} = \frac{\sigma}{\sqrt{n}} \]
Furthermore, if the population distribution is normal, then the distribution of x̄ is normal. Unit 22
 Sampling Distribution of the Sample Proportion
 When the sample size n is large, the sampling distribution of the sample proportion p̂ is approximately normally distributed with the following mean and standard deviation:
\[ \mu_{\hat{p}} = p \text{, where $p$ is the population proportion.}\\ \sigma_{\hat{p}} = \sqrt{\frac{p(1p)}{n}} \text{, where $n$ is the sample size.} \]
Unit 28  Scatterplot
 A graphical display of bivariate quantitative data in which each observation (x, y) is plotted in the plane. Unit 10
 SelfSelecting Sampling
 A sampling design in which the sample consists of people who respond to a request for participation in the survey. (Also called voluntary sampling.) Unit 17
 Significance Level
 In a significance test, the highest pvalue for which we will reject the null hypothesis. Unit 25
 Significance Test
 A method that uses sample data to decide between two competing claims, called hypotheses, about a population parameter. Unit 25
 Simple Random Sample of Size n
 A sample of n individuals from the population chosen in such a way that every set of n individuals has an equal chance to be in the sample actually selected. Unit 16
 Simple Random Sampling
 A sampling design that chooses a sample of size n using a method in which all possible samples of size n are equally likely to be selected. Unit 17
 SingleBlind Experiment
 An experiment in which the subjects do not know which treatment they are receiving but the individuals measuring the response do know which subjects were assigned to which treatments. Unit 15
 Skewed Right or Left

 Special Cause Variation
 Variation due to sudden, unexpected events that affect the process. Unit 23
 Standard Deviation of a Discrete Random Variable x
 Given a probability distribution, p(x), the standard deviation, σ, is calculated as follows:
\[ \sigma^2 = \sum{(x\mu)^2} \cdot p(x); \sigma = \sqrt{\sigma^2} \]
Unit 20  Standard Error of the Estimate
 A point estimate of σ, which is a measure of how much the observations vary about the regression line. The standard error of the estimate, s_{e}, is computed as follows:
\[s_e = \sqrt{MSE} = \sqrt{ \frac{\sum(y\hat{y})^2}{n2}}\]
Unit 30  Standard Error of the Slope b
 The estimated standard deviation of b, the leastsquares estimate for the population slope β, is:
\[s_b = \frac{s_e}{\sqrt{\sum(x\bar{x})^2}}\]
Unit 30  Standard Normal Distribution
 Normal distribution with μ = 0 and σ = 1. Unit 8
 Standard Normal Quantiles
 The zvalues that divide the horizontal axis of a standard normal density curve into intervals such that the areas under the density curve over each of the intervals are equal. Unit 9
 Stemplot (or StemandLeaf Plot)
 Graphical tool for organizing quantitative data in order from smallest to largest. The plot consists of two columns, one for the stems (leading digit(s) of the observations) and the other for the leaves (trailing digit(s) for each observation listed beside corresponding stem). Stemplots are a useful tool for conveying the shape of relatively small data sets and identifying outliers. Unit 2
 Strata
 The nonoverlapping groups used in a stratified sampling plan. Unit 17
 Stratified Random Sample
 A stratified sampling plan in which the sample is obtained by taking random samples from each of the strata. Unit 17
 Stratified Sampling
 A sampling plan that is used to ensure that specific nonoverlapping groups of the population are represented in the sample. The nonoverlapping groups are called strata. Samples are taken from each stratum. Unit 17
 Symmetric Distribution
 Shape of a distribution of a quantitative variable in which the lower half of the distribution is roughly a mirror image of the upper half. Unit 2
T
 tConfidence Interval for μ
 When σ is unknown, the sample size n is small, and the population distribution is approximately normal, a tconfidence interval for μ is given by the following formula:
\[\bar{x} \pm t^* \left( \frac{s}{\sqrt{n}} \right), \]
where t^{*} is a tcritical value associated with the confidence level and determined from a tdistribution with df = n  1 degrees of freedom. Unit 26  tDistribution
 Density curves for tdistributions are bellshaped and centered at zero, similar to the standard normal density curve. Compared to the standard normal distribution, a tdistribution has more area under its tails. The shape of a tdistribution, and how closely it resembles the standard normal distribution, is controlled by a number called its degrees of freedom (df). A tdistribution with df > 30 is very close to a standard normal distribution. Unit 26
 tTest Statistic
 In testing H_{0}: μ = μ_{0}, where μ is the population mean, the formula for the ttest statistic is:
\[ t = \frac{\bar{x}  \mu_0}{s/\sqrt{n}} \]
The ttest is used in situations where the population standard deviation σ is unknown, the sample size n is small, and the population has a normal distribution. If the null hypothesis is true, t has a tdistribution with df = n  1 degrees of freedom. Unit 26  tTest Statistic for the Slope
 In testing H_{0}: β = β_{0}, where β is the population slope, the formula for the ttest statistic is:
\[ t = \frac{b  \beta_0}{s_b}\text{, where }s_b = \frac{s_e}{\sqrt{\sum(x\bar{x})^2}} \]
When H_{0} is true, t has a tdistribution with df = n  2, where n is the number of (x,y)pairs in the sample. The usual null hypothesis is H_{0}: β = 0, which says that the straightline dependence on x has no value in predicting y. Unit 30  Test of Hypotheses
 A method that uses sample data to decide between two competing claims, called hypotheses, about a population parameter. Unit 25
 Test Statistic
 A quantity computed from the sample data that is used to make a decision between the null and alternative hypotheses in a significance test. Unit 25
 Third Quartile or Q3
 The threequarter point in an ordered set of data. To compute Q3, calculate the median of the upper half of the ordered data. Unit 5
 Treatment
 Any specific condition applied to the subjects in an experiment. If an experiment has more than one factor, then a treatment is a combination of specific values for each factor. Unit 15
 TwoSample tConfidence Interval for μ_{1}  μ_{2}
 When data are from two independent random samples from different populations, and the population standard deviations are unknown, a twosample tconfidence interval estimate of the difference in population means is given by the formula:
\[ (\bar{x}_1  \bar{x}_2) \pm t^* \sqrt{\frac{s_1^2}{n_1} + \frac{s_2^2}{n_2} } \]
There are two options for finding the degrees of freedom (df) associated with t^{*}, the tcritical value associated with the confidence level: (1) use technology or (2) use a conservative approach and let df = smaller of n_{1}  1 or n_{2}  1 . Unit 27  TwoSample tProcedures
 Two sample tprocedures are used to test or estimate μ_{1}  μ_{2}, the difference of two population means. The required data consists of two independent simple random samples of sizes n_{1} and n_{2} from each of the populations (or treatments). Unit 27
 TwoSample tTest Statistic
 In testing H_{0}: μ_{1}  μ_{2} = d, where μ_{1} and μ_{2} are the means of two populations, the formula for the twosample ttest statistic is:
\[ t = \frac{(\bar{x}_1  \bar{x}_2)  d}{\sqrt{\frac{s_1^2}{n_1} + \frac{s_2^2}{n_2}}}, \]
There are two options for finding the degrees of freedom (df) associated with t: (1) use technology or (2) use a conservative approach and let df = smaller of n_{1}  1 or n_{2}  1. Unit 27  TwoSided Alternative Hypothesis
 The alternative hypothesis in a significance test is twosided if it states that a parameter is different from the null hypothesis value. Unit 25
 TwoWay Table of Counts (Frequencies)
 A table with r rows and c columns that organizes data on two categorical variables taken from the same individuals or subjects. Values of the row variable label the rows of the table; values of the column variable label the columns of the table. Unit 13
U
V
 Variable
 Describes some characteristic or attribute of interest that can vary in value. Unit 2
 Variance of a Discrete Random Variable x
 Given a probability distribution, p(x), the variance is calculated as follows:
\[ \sigma^2 = \sum{(x\mu)^2} \cdot p(x) \]
Unit 20  Voluntary Sampling
 A sampling design in which the sample consists of people who respond to a request for participation in the survey. Also called selfselecting sampling. Unit 17
W
 WithinGroups Variation
 A measure of the spread of individual data values within each group about the group mean. It is measured by the mean square error, MSE. Unit 31
X
 x̄ Charts
 A plot of means of successive samples versus the order in which the samples were taken. Unit 23
Z
 zScore
 Transformation of a data value x into its deviation from the mean measured in standard deviations. To calculate a zscore for a data value x, subtract the mean and divide by the standard deviation:
\[ z = \frac{x  \mu}{\sigma} \]
Unit 8  zTest Statistic
 In testing H_{0}: μ = μ_{0}, where μ is the population mean, the formula for the ztest statistic is:
\[ z = \frac{\bar{x}  \mu_0}{\sigma / \sqrt{n}} \]
The ztest statistic is used in situations where the population standard deviation σ is known and either the population has a normal distribution or the sample size n is large. Unit 8  zTest Statistic for Proportions
 In testing H_{0}: p = p_{0}, where p is the population proportion, the formula for the ztest statistic is:
\[ z = \frac{\hat{p}  p_0}{\sqrt{\frac{p_0 (1  p_0)}{n}}} \]
The ztest is used in situations where the sample size n is large. Unit 28