## Normal Probability Plots and Sampling Distributions

We will create a couple of datasets with the TI-83, and then see how to determine if a particular dataset comes from a normally distributed population or not.   It might be a good idea to review chapter 6 of your textbook first.
1. First, let us generate a list containing a sample from a standard normal distribution.   To do this, choose the randNorm function from the MATH PRB menu.   Give (0,1,100) for the arguments in order to get a sample of size 100 from a normal distribution with mean 0 and standard deviation 1.   Put this in L1.

Of course, we expect this dataset to be symmetric, bell-shaped, and the normal probability plot should be straight.
1. Let us check the shape of the dataset with a histogram and a boxplot.   Both of these are under STAT PLOT; consult your manual to learn which icons represent the plots you want.

Note: For the histogram, it may be necessary to choose ZoomStat from the ZOOM menu.

As expected, both plots are symmetric, although not perfectly so.   Even generated data isn't perfect!
2. Now let us create a normal probability plot.   This is also under STAT PLOT as the last plot option.
Note 1: Choosing "Y" on the Data Axis line will give a plot consistent with both the textbook and ExcelTools.   Choose Y instead of X.

Note 2: You may want to choose ZoomStat again.   As expected, the plot is a fairly straight line.
2. Now, let us generate a list containing a sample from a skewed distribution that is not normal.   To do this, choose the randBin function from the MATH PRB menu.   Give (20,.1,50) for the arguments in order to get a sample of size 50 from a binomial distribution with n = 20 and p = .1.   Put this in L2.

Note: This takes about 30 seconds; be patient!   This dataset should be right-skewed, which we will be able to see in the histogram and boxplot, and the normal probability plot should not be straight, rather it we can expect it to be curved (upward).
1. Again, check the shape of the dataset with a histogram and a boxplot.

As expected, both plots are right-skewed.
2. Now create the normal probability plot.

As expected, the plot is not straight- it is curved upward.
You can do this method for any dataset in order to check if the sample came from a normal distribution or not.   Note that it is somewhat subjective; two people may not come to the same conclusion for a particular dataset!

Consider the following exercise: Plastic bags used for packaging produce are manufactured so that the breaking strength of the bag is normally distributed with a mean of 5 pounds per square inch and a standard deviation of 1.5 pounds per square inch.   A sample of 25 bags is selected.

So we have that the breaking strength of the bags is normal with lbs/in2 and lbs/in2.   Also, n = 25.

(a)(1) What is the probability that the average breaking strength is between 5 and 5.5 pounds per square inch?

P(0 < Z < 1.6667) = .45220967

Note that if we want to specify and in the normalcdf command, we actually have to plug in and .   You should always draw a curve, and give it some thought.   Use your common sense to decide if the answer the calculator comes up with is reasonable.

(a)(2) What is the probability that the average breaking strength is between 4.2 and 4.5 pounds per square inch?

P(-2.6667 < Z < -1.6667) = .04395991

(a)(3) What is the probability that the average breaking strength is less than 4.6 pounds per square inch?

P(Z < -1.3333) = .09121128

(b) Between what two values symmetrically distributed around the mean will 95% of the average breaking strengths be?

The values asked for here will be the 2.5th percentile, and the 97.5th percentile, since 95% of the average breaking strengths will be between that.   We need the Z quantile corresponding to the .025 tail area.

So the value, once we "un-standardize", is:

X = 4.412
And from the symmetry of the normal curve, the upper value will be

X = 5.588
So 95% of the average breaking strengths will fall between 4.412 and 5.588.

(c) What will your answers be to (a) and (b) if the standard deviation is 1.0 pound per square inch?

Do this part on your own!
(a)(1) .49379
(a)(2) .00617799
(a)(3) .002275
(b) (4.608, 5.392)

Consider the following exercise.

Historically, 93% of the deliveries of an overnight mail service arrive before 10:30 the following morning.   Random samples of 500 deliveries are selected.

So we have a "historic" (read: population) proportion of p = .93.   Also, n = 500.

(a) What proportion of the samples will have between 93% and 95% of the deliveries arriving before 10:30 the following morning?

P(0 < Z < 1.7528) = .46017915

Note that if we want to specify "" and "" in the normalcdf command, we actually have to plug in p =.93 and .

(b) What proportion of the samples will have more than 95% of the deliveries arriving before 10:30 the following morning?

P(Z > 1.7528) = .03982085

(c) If samples of size 1000 are selected, what will your answers be in (a) and (b)?

Do this part on your own!
(a) .49340852
(b) .00659148

(d) Which is more likely to occur - more than 95% of the deliveries in a sample of 500 or less than 90 before 10:30 the following morning?

We will answer the question of which is more "likely" by simply calculating the probability of each event:

P(ps > .95) = .03982085 (from part b)

P(Z < -3.7182) = .00010036

Therefore, more than 95% of the deliveries in a sample of 500 arriving before 10:30 the following morning is more likely to occur, since it has a higher probability.

Updated: 20 August 2003