next up previous contents index
Next: Class Code for Resampling Up: Resampling Previous: Resampling

Introduction

We have discussed determination of probabilities of events by enumeration and tree diagrams. These are useful for some small problems but are very limited. For example, the probability of opening with a pair in 5 card poker is impossible to obtain by these methods. We could turn to the theory of probability but that would involve higher mathematics. Fortunately, with ever increasing speed of computers we have another way, resampling . Using resampling we can estimate the probability of the event and, further, we can increase the accuracy of the estimation by simply increasing the number of resamples.

Another advantage of resampling is that you have to build a model to accomplish it and you can only build a correct model if you understand the problem. There are basically 4 steps to resampling. We outline the steps in general and then give several examples.

Let A be the event of interest.

1.
Choose a model and define a trial. In class, this often means portraying the sample space and event accurately using a table of random digits. The trial  (repetition of the experiment) must be done explicitly.
2.
Define the event  of interest in terms of Step 1. We must be able to compute the P(A) in terms of the trial.
3.
Obtain N trials of the experiment . Count the occurrences of the event A. Denote this count by #(A). It is extremely important that:
(a)
The trials are independent of one another.
(b)
The trials are performed under identical conditions.
If one or both of these conditions fail then there is NO guarantee whatsoever that the result in Step 4 is an estimate of the P(A)! Furthermore there is generally NO WAY to estimate the error of the estimate! It is indeed usually GIGO  Garbage In, Garbage Out.
4.
Estimate the P(A) by $\frac{\char93 (A)}{N}$.
Lets do a simple problem. On the roll of a fair 6 sided die, determine the probability that a 1 or 2 is the upface. Tough problem, right? The answer is 2/6 = 1/3 = .333. But this is a simple problem with which to demonstrate resampling. Here's the first 3 steps of the resampling experiment:
1.
Use random single digit random numbers 0 through 9. Discard (actually skip) digits 0, 7, 8, and 9.
2.
The event A is a 1 or 2.
3.
Pick at random a starting point in the 10 digit random number table  given in Appendix B. This is the first outcome. Read the succeeding outcomes one after another going down that column to the end. Then move to the top of the next column and continue until we have N trials.
Notice how explicit we were in describing how to do the N trials (Step 3). Notice that it ensures independent and identical trials (the digits in the table are random). This is a MUST! Failure to do so results in GIGO.

Lets do 30 repetitions of this experiment. We will use the table of random numbers. To make sure we are all on the same wavelength, I will use numbers in the first column, starting at the top. Remember to skip the digits 0, 7, 8, and 9.

Here are 30 trials:

5 5 1 4 3 2 6 2 2 2 4 1 6 1 6 3 6 3 6 5 6 3 4 1 4 2 6 2 1 4
Notice that a 1 or 2 came up 11 times. Hence our estimate of the probability of a 1 or 2 is 11/30 = .3667. Close to the true value.

Hey, we are on a roll! Lets try the urn problem of the previous chapter. Tough problem, but here is a resampling model:

1.
Choose two digit random numbers, 00 through 99. Discard 00 and 81 through 99. The numbers 01 through 30 represent a blue ball while the numbers 31 through 80 represent a red ball. Select 3 numbers and discard ties ( Here the problem is sampling without replacement).
2.
If we get 3 numbers from 01 through 30 then 3 blue ball were obtained and if we get 3 numbers 31 through 80 then 3 red balls were obtained. In either case, 3 of the same color occurred. Count these up.
3.
Pick at random a starting point in the 10 digit random number table. Use 2 columns. This is the first outcome. Read the succeeding outcomes one after another going down that those 2 columns to the end. Then move to the top of the next 2 columns and continue until we have N trials.
Lets obtain 30 repetitions of this experiment. We will use the table of random numbers. To make sure we are all on the same wavelength, I will use numbers in the first 2 columns, starting at the top.
59, 58, 12;   02, 41, 30;   29, 60, 20;  01, 21, 04;  07, 24, 06;  42, 15, 65;  
19, 09, 06 ;  66, 38, 63;   31, 61, 55;  63, 73, 30;  47, 15, 49;  25, 62, 29;  
75, 18, 48;   60, 53, 25;   29, 53, 21.
Lets turn them into colored balls:
R, R, B;       B, R, B;      B, R, B;     B, B, B;    B, B, B;     R, B, R;     
B, B, B ;      R, R, R;      R, R, R;     R, R, B;    R, B, R;     B, R, B; 
R, B, R;       R, R, B;      B, R, B.

So our estimate of the probability that all the balls are of the same color is: 5/15.

What's the error  here? In the next two chapters, we will consider this in some detail. But for now, lets just state the error as follows. Denote our estimate of the probability of interest by $\hat{p}$ . It is read "p hat". Then our error of estimation  is

\begin{displaymath}2 \sqrt{\frac{\hat{p}(1-\hat{p})}{N}}
\end{displaymath}

Notice that the error decreases proportionally by $\sqrt{N}$; hence, the more repetitions the smaller the error. For the urn problem, $\hat{p} = .3333$ and the error is 0.2434262. Notice that the interval $(\hat{p} - \textrm{error}, \hat{p} + \textrm{error})$ traps the true probability of .2880. This error is huge, because N is so small. Alas, I got very bored doing 15 repetitions of this experiment. But guess what? Yep, you got it. The computer will not get bored doing 10,000 reps. In which case the error is about 0.0091. (I used the correct value .2880 for this calculation. In practice, use the estimate $\hat{p}$).


Exercise 4.1.1  
1.
Paula has 6 pairs of earrings in a box. She grabs two of the earrings in the box (sampling without replacement). Find the probability that she has a matched set of earrings.

Using the random number table, model this problem. (Hint: Use 0,1 for first pair; 2,3 for second pair; etc. Now the length of the trial is 2 (that's all she grabs and remember it's sampling without replacement). Next resample 10 trials of your model. For each trial record success (got a matched pair) or failure (did not get a matched pair). Obtain $\hat{p}$ your estimate of the desired probability. Calculate the error of estimation.

2.
When his alarm goes off, John hits the snooze button on it 80% of the time. If he fails to hit it, he gets up. The snooze alarm only works for 6 hits. Find the probability that John sleeps at least an extra 20 minutes. Using the random number table, model this problem. (Hint: Let 1-8 denote John hitting the button and 0,9 denote he doesn't. Note that the length of the trial is either 6 or when the first 0 or 9 occurs before 6.)

Next resample 10 trials of your model. For each trial record the extra sleep John got (for example, suppose the trial is 4, 6 ,9. Then John slept for an extra 20 minutes which is a success for the event we want). Obtain $\hat{p}$ your estimate of the desired probability. Calculate the error of estimation.

3.
20 passengers are on a bus that enters a foreign country. 12 of these passengers are women. At the gate to the foreign country, a guard gets on the bus and selects 6 people at random for an extensive visa check. Find the probability that (a) all 6 are males. Find the probability that (b) all 6 are females. Find the probability that (c) 4 are females.

Using the random number table, model this problem. Next resample 10 trials of your model. For each trial record the success or failure for each of (a), (b), and (c). Obtain $\hat{p}$ your estimate of the desired probability for each event. Calculate the error of estimation.

4.
Betty is playing 5 card draw poker. She holds 3 hearts and 2 clubs. In the draw, she decides to discard her 2 clubs and get two more cards. Find the probability that she will get a flush in hearts, i.e., her 2 cards in the draw are hearts.

Using the random number table, model this problem. Next resample 10 trials of your model. For each trial record the success or failure for the desired event. Obtain $\hat{p}$ your estimate of the desired probability. Calculate the error of estimation.

5.
Jack pays $10 to play a dice game in which 5 fair dice are rolled. If the dice result in:
(a)
All dice come up 6, Jack wins $500.
(b)
All dice are the same, Jack wins $100.
(c)
All dice are even, Jack wins $20.
(d)
Else Jack wins nothing.
Find the probability that Jack wins some money.

Using the random number table, model this problem. Next resample 10 trials of your model. For each trial record the success or failure for the desired event. Obtain $\hat{p}$ your estimate of the desired probability. Calculate the error of estimation.


next up previous contents index
Next: Class Code for Resampling Up: Resampling Previous: Resampling

2001-01-01