Continuous data are data without natural categories. These are usually
measurements such as height, weight, age, temperature, or cholesterol.
For weight one might think that 200 is a natural category, but in kilograms
200 pounds is 90.8 KG which is not even an integer. Because we can not
measure infinitely precise, measurements are approximations.
Example: Here is a sample of head sizes (maximum measurement across the top of the skull in mm) of 25 Etruscans. This data was taken form the data set Etruscan-Italian head sizes data set given in Appendix A.
141 148 132 138 154 142 150 146 155 158 150 140 147 148 144 150 149 145 149 158 143 141 144 144 126So what do we need? A picture, of course. The above picture for the discrete data is a nice visual summary. So we need a sampling distribution of these numbers. Since continuous data have no natural categories we have to create some categories. This results in a sample distribution. If we create other categories we will get a different picture. We need a way of creating these pictures fast so that if we don't like a picture we make another one. We will do this with a stem leaf plot . The categories are the stems. For instance, suppose for the Etruscan data we choose the interval 120-129 as our first category. Every measurement that falls into this interval has the same first two digits, namely, 12. This is called the stem for the class. The remaining digits of a measurement is called the leaf. For example, the skull size 126 falls into this class; so 126's stem is 12 and its leaf is 6. For a stem-leaf plot we simply put the leaves on the stem. All that is lost (except for possible rounding) is the order of recording of the data which may be important in some applications but it is not in this case. A stem leaf plot of the above data set is:
12 6
13 28
14 182607849593144
15 4058008
Do you like the picture? No, neither do I. The numbers are too bunched
up. We need more categories (stems). But this is easy to do with stem-leaf
plots. Lets split each stem into two. In this case the leaves 0 through
4 go on the lower stem while the leaves 5 through 9 go on the upper stem.
The picture is
12 6
13 2
13 8
14 12043144
14 8678959
15 4000
15 588
This picture is better than the first. I wouldn't split the stems again.
Although we only have 25 numbers here, certainly the picture is much more
informative than looking at the above string of numbers. We can see immediately
that in this sample most Etruscans have head sizes between 140-150 mm and
there are a few with smaller head sizes.
Data 14 117 77 81 205 21 22 157 134 69 193 8 162 0 156 194 17 100 50 53 235 29 191 81 167 29 158 105 171 2 8 89 82 11 247 149 106 61 18 172
Try the same example data (given below). Choose stemleaf in the summary module after entering the data.
12 18 25 15 9 14 21 25 28 125
We need a little on shapes of distributions so that we can discuss them. We will classify distributions as symmetric or asymmetric . Symmetric distributions are (approximately if it's a sample distribution) symmetric about a point on the data axis. An example of a symmetric sample distribution is given by:
Low: 49
6 : 4
6 : 78
7 : 14
7 : 556788
8 : 0122334
8 : 67799
9 : 01122223334
9 : 5555666677788889999
10 : 000000001122223444
10 : 568889
11 : 000001134
11 : 599
12 : 123
12 : 89
13 : 2
13 : 56
14 : 0
High: 161
To avoid many empty stems on the ends of stem-leaf plots sometimes, as
in the above plot, the low and the high points are just indicated, as 49
and 161 are here. The point of symmetry in this plot is close to 95.
-1 : 2
-0 :
0 : 2
1 : 5
2 : 5669
3 : 1125677
4 : 223445556699
5 : 0111233344444566667788888999
6 : 001122244444445555555567788899
7 : 014445566778899
8 : 0122334455677799
9 : 011122223334455556666777788889999
10 : 000000001122223444568889
11 : 0000011123334599
12 : 0122389
13 : 256
14 : 0
15 :
16 : 1
A distribution is a asymmetric if it is not symmetric. One class
of asymmetric distributions of interest is the class of skewed distributions.
These either have a long tail to the right or to the left. For example,
this is a right skewed sample distribution.
0 : 1223444444 0 : 556666667777777888999 1 : 0011111123333344 1 : 5555566788889999 2 : 011222333334 2 : 56666789999 3 : 0114 3 : 668 4 : 02 4 : 58 5 : 02 High: 617Another plot for continuous data that we will frequently use is the dotplot , For a dotplot, simply draw a number line, mark off the range of the data, and then record a dot at the value of each observation. For observations with the same value put the second dot above the first. Here is a dotplot for the Etruscan data.
. .
. . . . : .. : .. . :: : .. :
-+---------+---------+---------+---------+---------+-----
126.0 132.0 138.0 144.0 150.0 156.0
An interesting application of dotplots concerns comparison dotplots of several data sets. Suppose we have several data sets that we want to compare. Simply draw one number line. Then for each sample put a row of dots corresponding to the measurements. For example here are skull measurements of 20 modern Italians taken from the data set Etruscan-Italian head sizes data set.
133 128 136 140 127 136 131 131 128 132 125 133 134 136 134 129 132 139 143 138
Here is the comparison dotplot between the Italian skull sizes and the above Etruscan skull sizes.
. .
Etruscan . . . . : .. : .. . :: : .. :
-+---------+---------+---------+---------+---------+-----
.
Italian . .: . : : :: : . .. .
-+---------+---------+---------+---------+---------+-----
126.0 132.0 138.0 144.0 150.0 156.0
Any conclusions about the Etruscan and Italian skull sizes? It appears
that the Etruscans have larger heads than the Italians. As the exercise below
shows, this difference occurs when all the data are used. We will discuss
inference based on this data set in Chapter 8.
Weights 122 146 65 162 148 155 136 151 151 153 201 156 235 157 160 171 178 197 142 131
Sample 1
76 183 125 24 8 59 25 179 29 101
55 108 68 128 5 12 35 25 122 39
59 91 90 81 66 20 178 111 186 26
5 123 124 45 13 79 158 20 92 23
Sample 2
66 9 62 21 11 39 21 24 21 19
67 71 67 0 4 82 32 91 152 124
20 108 5 63 1 10 23 125 59 25
Sample 3
59 54 19 79 22 81 18 67 61 53
71 14 10 87 76 49 21 16 35 11
7 77 90 6 79 55 83 28 11 60
55 43 9 65 25