Normal Probability Plots and Sampling Distributions

 

This page will contain examples of the following:


We will create a couple of datasets with the TI-83, and then see how to determine if a particular dataset comes from a normally distributed population or not.   It might be a good idea to review chapter 6 of your textbook first.
  1. First, let us generate a list containing a sample from a standard normal distribution.   To do this, choose the randNorm function from the MATH PRB menu.   Give (0,1,100) for the arguments in order to get a sample of size 100 from a normal distribution with mean 0 and standard deviation 1.   Put this in L1.

    TI-83 screen TI-83 screen

    Of course, we expect this dataset to be symmetric, bell-shaped, and the normal probability plot should be straight.
    1. Let us check the shape of the dataset with a histogram and a boxplot.   Both of these are under STAT PLOT; consult your manual to learn which icons represent the plots you want.

      TI-83 screen

      Note: For the histogram, it may be necessary to choose ZoomStat from the ZOOM menu.

      TI-83 screen TI-83 screen

      TI-83 screen TI-83 screen

      As expected, both plots are symmetric, although not perfectly so.   Even generated data isn't perfect!
    2. Now let us create a normal probability plot.   This is also under STAT PLOT as the last plot option.  
      Note 1: Choosing "Y" on the Data Axis line will give a plot consistent with both the textbook and ExcelTools.   Choose Y instead of X.

      TI-83 screen TI-83 screen

      Note 2: You may want to choose ZoomStat again.   As expected, the plot is a fairly straight line.
  2. Now, let us generate a list containing a sample from a skewed distribution that is not normal.   To do this, choose the randBin function from the MATH PRB menu.   Give (20,.1,50) for the arguments in order to get a sample of size 50 from a binomial distribution with n = 20 and p = .1.   Put this in L2.  

    TI-83 screen

    Note: This takes about 30 seconds; be patient!   This dataset should be right-skewed, which we will be able to see in the histogram and boxplot, and the normal probability plot should not be straight, rather it we can expect it to be curved (upward).  
    1. Again, check the shape of the dataset with a histogram and a boxplot.  

      TI-83 screen TI-83 screen

      As expected, both plots are right-skewed.  
    2. Now create the normal probability plot.  

      TI-83 screen

      As expected, the plot is not straight- it is curved upward.  
    You can do this method for any dataset in order to check if the sample came from a normal distribution or not.   Note that it is somewhat subjective; two people may not come to the same conclusion for a particular dataset!  

    Consider the following exercise: Plastic bags used for packaging produce are manufactured so that the breaking strength of the bag is normally distributed with a mean of 5 pounds per square inch and a standard deviation of 1.5 pounds per square inch.   A sample of 25 bags is selected.  

    So we have that the breaking strength of the bags is normal with $\mu = 5$ lbs/in2 and $\sigma = 1.5$ lbs/in2.   Also, n = 25.

    (a)(1) What is the probability that the average breaking strength is between 5 and 5.5 pounds per square inch?

    $P(5 < \overline{X} < 5.5) =
P(\frac{5 - 5}{1.5/\sqrt{25}} < Z < \frac{5.5 - 5}{1.5/\sqrt{25}}) =$
    P(0 < Z < 1.6667) = .45220967

    normal curve       TI-83 screen

    Note that if we want to specify $\mu$ and $\sigma$ in the normalcdf command, we actually have to plug in $\mu = 5$ and $\sigma / \sqrt{n} = 1.5 / \sqrt{25} = 0.3$.   You should always draw a curve, and give it some thought.   Use your common sense to decide if the answer the calculator comes up with is reasonable.

    (a)(2) What is the probability that the average breaking strength is between 4.2 and 4.5 pounds per square inch?

    $P(4.2 < \overline{X} < 4.5) =
P(\frac{4.2 - 5}{1.5/\sqrt{25}} < Z < \frac{4.5 - 5}{1.5/\sqrt{25}}) =$
    P(-2.6667 < Z < -1.6667) = .04395991

    normal curve       TI-83 screen

    (a)(3) What is the probability that the average breaking strength is less than 4.6 pounds per square inch?

    $P(\overline{X} < 4.6) =
P(Z < \frac{4.6 - 5}{1.5/\sqrt{25}}) =$
    P(Z < -1.3333) = .09121128

    normal curve       TI-83 screen

    (b) Between what two values symmetrically distributed around the mean will 95% of the average breaking strengths be?

    normal curve

    normal curve

    The values asked for here will be the 2.5th percentile, and the 97.5th percentile, since 95% of the average breaking strengths will be between that.   We need the Z quantile corresponding to the .025 tail area.

    TI-83 screen

    So the $\overline{X}$ value, once we "un-standardize", is:

    $-1.95996 = \frac{\overline{X} - 5}{1.5/\sqrt{25}}$
    $-0.587989 = \overline{X} - 5$
    X = 4.412
    And from the symmetry of the normal curve, the upper value will be
    $1.95996 = \frac{\overline{X} - 5}{1.5/\sqrt{25}}$
    $0.587989 = \overline{X} - 5$
    X = 5.588
    So 95% of the average breaking strengths will fall between 4.412 and 5.588.

    (c) What will your answers be to (a) and (b) if the standard deviation is 1.0 pound per square inch?

    Do this part on your own!
    Here are the answers:
    (a)(1) .49379
    (a)(2) .00617799
    (a)(3) .002275
    (b) (4.608, 5.392)


    Consider the following exercise.

    Historically, 93% of the deliveries of an overnight mail service arrive before 10:30 the following morning.   Random samples of 500 deliveries are selected.

    So we have a "historic" (read: population) proportion of p = .93.   Also, n = 500.

    (a) What proportion of the samples will have between 93% and 95% of the deliveries arriving before 10:30 the following morning?

    $P(.93 < p_{s} < .95) =
P(\frac{.93 - .93}{\sqrt{\frac{.93(.07)}{500}}} <
Z < \frac{.95 - .93}{\sqrt{\frac{.93(.07)}{500}}}) =$
    P(0 < Z < 1.7528) = .46017915

    normal curve       TI-83 screen Note that if we want to specify "$\mu$" and "$\sigma$" in the normalcdf command, we actually have to plug in p =.93 and $\sqrt{\frac{.93(.07)}{500}}$.

    (b) What proportion of the samples will have more than 95% of the deliveries arriving before 10:30 the following morning?

    $P(p_{s} > .95) =
P(Z > \frac{.95 - .93}{\sqrt{\frac{.93(.07)}{500}}}) =$
    P(Z > 1.7528) = .03982085

    normal curve       TI-83 screen

    (c) If samples of size 1000 are selected, what will your answers be in (a) and (b)?

    Do this part on your own!
    Here are the answers:
    (a) .49340852
    (b) .00659148

    (d) Which is more likely to occur - more than 95% of the deliveries in a sample of 500 or less than 90 before 10:30 the following morning?

    We will answer the question of which is more "likely" by simply calculating the probability of each event:

    P(ps > .95) = .03982085 (from part b)
    $P(p_{s} < .90) =
P(Z < \frac{.90 - .93}{\sqrt{\frac{.93(.07)}{1000}}}) =$
    P(Z < -3.7182) = .00010036

    TI-83 screen

    Therefore, more than 95% of the deliveries in a sample of 500 arriving before 10:30 the following morning is more likely to occur, since it has a higher probability.

    Updated: 20 August 2003