next up previous contents index
Next: The 5 Basic Descriptive Up: Descriptive Statistics Previous: Describing Discrete Data

Sample Distributions for Continuous Data

Continuous data   are data without natural categories. These are usually measurements such as height, weight, age, temperature, or cholesterol. For weight one might think that 200 is a natural category, but in kilograms 200 pounds is 90.8 KG which is not even an integer. Because we can not measure infinitely precise, measurements are approximations.

Example: Here is a sample of head sizes (maximum measurement across the top of the skull in mm) of 25 Etruscans. This data was taken form the data set   Etruscan-Italian head sizes data set given in Appendix A.

141    148    132    138    154    142    150    146    155    158    150
140    147    148    144    150    149    145    149    158    143    141
144    144    126
So what do we need? A picture, of course. The above picture for the discrete data is a nice visual summary. So we need a sampling distribution of these numbers. Since continuous data have no natural categories we have to create some categories. This results in a sample distribution. If we create other categories we will get a different picture. We need a way of creating these pictures fast so that if we don't like a picture we make another one. We will do this with a stem leaf plot  . The categories are the stems. For instance, suppose for the Etruscan data we choose the interval 120-129 as our first category. Every measurement that falls into this interval has the same first two digits, namely, 12. This is called the stem for the class. The remaining digits of a measurement is called the leaf. For example, the skull size 126 falls into this class; so 126's stem is 12 and its leaf is 6. For a stem-leaf plot we simply put the leaves on the stem. All that is lost (except for possible rounding) is the order of recording of the data which may be important in some applications but it is not in this case. A stem leaf plot of the above data set is:
        12 6
        13 28
        14 182607849593144
        15 4058008
Do you like the picture? No, neither do I. The numbers are too bunched up. We need more categories (stems). But this is easy to do with stem-leaf plots. Lets split each stem into two. In this case the leaves 0 through 4 go on the lower stem while the leaves 5 through 9 go on the upper stem. The picture is
        12 6
        13 2
        13 8
        14 12043144
        14 8678959
        15 4000
        15 588
This picture is better than the first. I wouldn't split the stems again. Although we only have 25 numbers here, certainly the picture is much more informative than looking at the above string of numbers. We can see immediately that in this sample most Etruscans have head sizes between 140-150 mm and there are a few with smaller head sizes.
Note that a stem-leaf plot is also a histogram  . Technically the histogram is just the picture (not the leaves). We will often use histograms in this class.


Exercise 2.3.1  
1.
Consider again Carrie's baseball data given in Appendix A. Glance through the weights (second column) the baseball players. What does a typical baseball player weigh? Do more baseball players weigh over 200 pounds than under 170?
2.
Obtain a stem-leaf plot of the weights of the baseball players. Now answer the questions in the last problem. For your stem-leaf plot, should the stems be split or grouped together?
3.
The typical American male weighs about 170-175 pounds. Based on your stem-leaf plot, how would you compare the weights of baseball players to typical American males?
4.
The typical American male height is 70 inches. What about the heights of baseball players? Base your answer on a stem-leaf plot of the baseball players' heights.
5.
Obtain a stem-leaf plot of the following using the summary module.
Data
14   117    77   81   205   21   22   157   134   69
193    8   162    0   156  194   17   100    50   53
235   29   191   81   167   29  158   105   171    2
8     89    82   11   247  149  106    61    18  172


Try the same example data (given below). Choose stemleaf in the summary module after entering the data.

       12 18 25 15 9 14 21 25 28 125

We need a little on shapes of distributions   so that we can discuss them. We will classify distributions as symmetric   or asymmetric  . Symmetric distributions are (approximately if it's a sample distribution) symmetric about a point on the data axis. An example of a symmetric sample distribution is given by:

Low:   49

    6 : 4
    6 : 78
    7 : 14
    7 : 556788
    8 : 0122334
    8 : 67799
    9 : 01122223334
    9 : 5555666677788889999
   10 : 000000001122223444
   10 : 568889
   11 : 000001134
   11 : 599
   12 : 123
   12 : 89
   13 : 2
   13 : 56
   14 : 0

High:  161
To avoid many empty stems on the ends of stem-leaf plots sometimes, as in the above plot, the low and the high points are just indicated, as 49 and 161 are here. The point of symmetry in this plot is close to 95.
The above plot is unimodal  , a single   mode or peak. Around 95. Here's the stem-leaf plot of a data set which is bi-modal  , two peaks, and which is symmetric:
   -1 : 2
   -0 :
    0 : 2
    1 : 5
    2 : 5669
    3 : 1125677
    4 : 223445556699
    5 : 0111233344444566667788888999
    6 : 001122244444445555555567788899
    7 : 014445566778899
    8 : 0122334455677799
    9 : 011122223334455556666777788889999
   10 : 000000001122223444568889
   11 : 0000011123334599
   12 : 0122389
   13 : 256
   14 : 0
   15 :
   16 : 1
A distribution is a asymmetric   if it is not symmetric. One class of asymmetric distributions of interest is the class of skewed   distributions. These either have a long tail to the right or to the left. For example, this is a right skewed sample distribution.
   0 : 1223444444
   0 : 556666667777777888999
   1 : 0011111123333344
   1 : 5555566788889999
   2 : 011222333334
   2 : 56666789999
   3 : 0114
   3 : 668
   4 : 02
   4 : 58
   5 : 02

High:  617
Another plot for continuous data that we will frequently use is the dotplot  , For a dotplot, simply draw a number line, mark off the range of the data, and then record a dot at the value of each observation. For observations with the same value put the second dot above the first. Here is a dotplot for the Etruscan data.
                                          .         .
            .         .         .  . : .. : .. . :: :      ..    :
           -+---------+---------+---------+---------+---------+-----
        126.0     132.0     138.0     144.0     150.0     156.0

An interesting application of dotplots concerns comparison dotplots   of several data sets. Suppose we have several data sets that we want to compare. Simply draw one number line. Then for each sample put a row of dots corresponding to the measurements. For example here are skull measurements of 20 modern Italians taken from the data set Etruscan-Italian head sizes   data set.

133    128    136    140    127    136    131    131    128    132    125
133    134    136    134    129    132    139    143    138

Here is the comparison dotplot between the Italian skull sizes and the above Etruscan skull sizes.

                                          .         .
Etruscan    .         .         .  . : .. : .. . :: :      ..    :
           -+---------+---------+---------+---------+---------+-----
                             .
Italian   .   .: .  : : ::   :  . ..    .
           -+---------+---------+---------+---------+---------+-----
        126.0     132.0     138.0     144.0     150.0     156.0
Any conclusions about the Etruscan and Italian skull sizes? It appears that the Etruscans have larger heads than the Italians. As the exercise below shows, this difference occurs when all the data are used. We will discuss inference based on this data set in Chapter 8.


Exercise 2.3.2  
1.
Obtain a dot plot of the weights of the twenty students discussed above and listed again as :
Weights
122 146  65 162 148 155 136 151 151 153 
201 156 235 157 160 171 178 197 142 131
2.
For Carrie's baseball data, obtain comparison dotplots of the batting averages (6th column of the data for the hitters only, (signified by a 1 in the 5th column)) by the side of the plate they hit from R, L or Switch, signified by a 1, 2 or 3 in column 3.
3.
Obtain comparison dotplots of the the Etruscan and Italian data given in Appendix A. Note that the Etruscans formed an ancient civilization in Truscany, Northern Italy, that predated the Romans. There is some question as to where the Etruscans came from. Were they native to Italy or not? Draw conclusions about this mystery based on the comparison dotplots.
4.
Obtain stem-leaf plots and comparison dotplots for the following 3 samples. Comment on the shape of each.
     Sample 1 

            76 183 125  24  8 59  25 179  29 101 
            55 108  68 128  5 12  35  25 122  39 
            59  91  90  81 66 20 178 111 186  26 
            5  123 124  45 13 79 158  20  92  23

       Sample 2 

            66   9 62 21 11 39 21  24  21  19 
            67  71 67  0  4 82 32  91 152 124 
            20 108  5 63  1 10 23 125  59  25

       Sample 3 

            59 54 19 79 22 81 18 67 61 53 
            71 14 10 87 76 49 21 16 35 11 
             7 77 90  6 79 55 83 28 11 60 
            55 43  9 65 25


next up previous contents index
Next: The 5 Basic Descriptive Up: Descriptive Statistics Previous: Describing Discrete Data

2001-01-01