Teacher resources and professional development across the curriculum
Teacher professional development and classroom resources across the curriculum
Let's return to our Galton board for further exploration. In the previous section, we used the Law of Large Numbers to see that, for each bin, the theoretical and experimental probabilities get closer and closer to one another as we put more and more marbles through the process. We're going to return to theoretical probabilities now and examine how the probabilities are distributed across all of the bins.
Recall that in our earlier examples the bins in the middle had higher probabilities than did the bins at the sides. This was because there are more paths that terminate in the middle bins than terminate in the side bins. Let's make a histogram that correlates to the bins and their probabilities.
First a simple 2 row machine:
Notice that the distribution is symmetric; the probabilities for both the far right and far left bins are equal. Let's look at the histogram for a Galton board with four rows.
Notice that the probabilities for each version of the system are distributed across all of the bins and that even though the individual probabilities change as we add more bins, they always sum to 1. This is in keeping with our intuition that any marble must indeed end up in exactly one of the bins. At this point, we're going to need to label the bins so that we can discuss results in more detail.
To do this, let's say that each marble, as it progresses through the system, gets 1 point for each movement to the right and 0 points for each movement to the left. Each bin can then be represented by the summative "score" of a marble that ends up there.
For example, a marble passing through a four-row system would have a maximum possible score of 4, corresponding to the right-most bin, and a minimum possible score of zero, corresponding to the left-most bin. The remaining bins would have scores as follows:
The average result, which would be the expected score for one average marble, can be found as before by multiplying the value of each bin by the probability of landing there and summing the results. This would give us .
Interestingly, we can also find this by looking at the number of rows and multiplying by the probability of going right. This would be 4 rows times = 2. Recall that the number of rows corresponds to the maximum score that a marble can get. Multiplying this maximum score by the average probability at each peg gives an expected average value. We'll need this method in just a bit. Furthermore, we want to mathematically describe how the values of the scores of each marble are spread out around the mean. In other words, we need a way to describe how random results vary from their expected value. This would give us a sort of sensible ruler that we can use. To do this, it makes sense to look at the difference between the expected value of each marble and the mean, so we subtract the two quantities.
x - m where x = expected value and m = mean
Because this quantity is something like a notion of distance, we square it to ensure that the value won't be negative.
(x - m)^{2}
We then multiply this squared difference by the probability of ending up in that bin.
P(x - m)^{2}
Finally, if we add all of these terms together, we will get a number that describes how the expected values of the bins are distributed around the mean. This is known as the variance.
Because the variance is based on the square of the difference between a result and its expected value, it scales somewhat awkwardly.
For example, if the difference between the expected value and the mean changes by a factor of three, the variance would change by a factor of 3^{2}, or 9. To mitigate this so that our ruler scales in a more sensible manner, we can take the square root of our final result. Taking the square root of the variance gives us a measure of the average difference between a marble's score and the mean. This is known as the standard deviation.
The number of bins corresponds to the number of rows in the system. Let's call this number n. Notice that the maximum score is also n. Remember that this setup can represent many different situations, such as the result of n coin tosses, or any other situation, regardless of whether or not the odds of each individual event are 1 to 1, sometimes referred to as a "50-50 chance."
What would happen in a situation in which each individual event has a probability other than ? Let's return to dice for a moment, and then see how we can model this on our Galton board. For instance, let's say we want to roll a 5 with one die. We either roll a 5 or we don't, so this situation is binary, but unlike the coin toss, the odds are not 1 to 1. The probability of rolling a 5 with one die is —there is only one way to roll a 5, whereas there are five possible results other than a 5. We can model this using a Galton board by equating right deflections with rolling a 5 and left deflections with rolling anything else. If we then tilt the board such that each marble has a chance of going to the right at each peg and a chance of going to the left, we have a great model for our problem-it's like having a "biased coin," one in which the probability of getting a head is only and the probability of getting a tail is 1-(), or .
With this model, it is easy to answer questions such as, what is the probability of rolling a 5 exactly once in four rolls? In terms of our modified, tilted system, this correlates to a marble going through four rows and deflecting to the right only once, ending up in bin 1. To find the probability that a marble will end up in bin 1, which is the same probability of rolling one 5 in four rolls, we can no longer simply count paths as we did before, because not all paths are equally likely. Nonetheless, we can use our path count as a starting point.
In our four-row system, we know that there are four possible paths to bin 1. Now, instead of looking at the ratio of the number of paths ending in bin 1 to the total number of paths to find the probability of ending up in bin 1, we can think about the probability of each specific path occurring. Each path is a sequence of four events, and each event is either a left (L) or right (R) shift in direction. The four paths to bin 1 are thus, LLLR, LLRL, LRLL, RLLL. The probability for each of these paths is the product of the probabilities of the individual events in the sequence. For example: the path LLLR has a probability of () () () (). The path RLLL has the probability () () () (). Notice that all the paths to bin 1 have the same probability. Therefore, to find the probability of ending up in bin 1, we can just add the probabilities of taking the specific paths that end in bin 1. Since all of these probabilities are the same, we can simply multiply the probability for one path, , times the number of paths (4) to get .
We can generalize this thinking to arrive at an expression that will tell us the probability of landing in the k^{th} bin of a system with n rows, in which the probability of going to the right at each peg is p and the probability of going to the left is 1-p. We multiply the number of paths, , times the probability of going right, p, to the k^{th} power, times the probability of going left, (1-p) to the (n-k)^{th} power (because if you go right k times, you necessarily go left the rest of the time). The probability of landing in the k^{th} bin is then:
× p^{k} × (1-p)^{(n-k)}
Using p = and (1-p) = , from our dice example, we see that the distribution of probabilities after four rows on the Galton board has shifted to the left somewhat from what it was for the p = (1-p) = situation of the fair coin toss. Intuitively it makes sense that, if a marble has a greater chance of going left than right at each peg, then there is a greater chance that it will end up in the left bins.
Let's look at how this affects the average marble's score. We'll need to find the mean again, and we can do this, as we did before, by multiplying the number of rows by the probability of deflecting to the right. (4 rows = ).
So, shifting the probability at each peg from to both moves the entire distribution of probabilities to the left and shifts the mean value from 2 to . We now see how the probability of each event (turn) determines the overall distribution of outcomes of repeated events (sequences of turns).
Not only does the mean shift, but the variance and standard deviation shift as well. Recall that these values have to do with how the outcomes are distributed around the mean. This distribution of probabilities, or outcomes, is called the binomial distribution, and it is a commonly occurring distribution in sequences of repeated events in which there are only two possible outcomes for each event.
The binomial distribution is useful, but it can take a long time to calculate, especially in situations in which n, the number of events, or the number of rows in the system, is large. There is an approximation to this distribution, however, that is much more easily calculated and that provides a reasonably good model for the probability distribution. It can be found using only the mean and the standard deviation, and it is known as the normal distribution, familiar to many of us as the "bell curve."
The normal distribution is related to a model of the distribution of the probabilities of outcomes of repeated independent events-also called "Bernoulli trials." As we can see, it is a bell-shaped curve, and it turns out that it is characterized by two properties. One distinguishing characteristic is its mean, which correlates to the central position of the bell around which it is symmetric. The other characteristic is the standard deviation. In the graph above, this corresponds to the position where we see a point of inflection on the graph (there is one on either side of the mean, indicated in the figure above). One standard deviation is the average difference between an outcome and the mean. In terms of percentages, the standard deviation, marked on either side of the mean, defines the range within which 68% of the results fall (on average). In other words, if scores on a test were normally distributed, about 68% of the students would fall within one standard deviation of the mean. For example, if the mean were 65 and the standard deviation 7, then 68% of students would score between 58 and 72. What's more, about 95% of students would have scores within two standard deviations of the mean, and about 97.5% of students would have scores within three standard deviations of the mean. For this example, only 2.5% of students would have scores higher than 86 or lower than 44. This is commonly known as the 68-95-97.5 rule for normal distributions.
The normal distribution approximation provides a powerful tool for predicting how the results of repeated independent experiments will be distributed. Furthermore, the more events in sequence that we look at, the better the normal distribution is at describing our results. Of course, there can always be outliers, such as a string of all heads or tails, that momentarily will skew the distribution one way or the other. However, on average, the normal distribution is fairly representative of the real world. In terms of our 50-50 Galton board, which can model a variety of binary situations, this means that the more rows we have, the closer our distribution will be to the normal distribution. The underlying reason for this involves the Central Limit Theorem, and it is to this concept that we will now turn.
Next: 7.6 Central Limit Theorem