Stem-leaf plots and histograms are useful descriptions of a sample but
often we want to describe samples or compare samples with a few descriptive
statistics. The statistics we discuss next are commonly called the **Five
Basic Descriptive Statistics** .
First, alas, we need a little notation. We will be discussing samples
throughout this course and we need to often call the items something. So
for a generic sample of size *n* lets use

where

x_{1} |
denotes the first item (measurement) in the sample, |
||

x_{2} |
denotes the second item (measurement) in the sample, |
||

x_{i} |
|||

x_{n} |

Using this notation we can now define the 5 basic descriptive statistics.
We will illustrate these statistics with the sample of *n=25 *Etruscan
skull sizes, given above, but repeated for convenience:

126 132 138 140 141 141 142 143 144 144 144 145 146 147 148 148 149 149 150 150 150 154 155 158 158Note that we have ordered the data from low to high. If you had to choose some numbers to describe this data set, probably the first two you would pick are the

Now that we have the range of the data, the next statistic is a measure
of the center of the sample. We will use the **median **. The median
is the middle ordered data point if the sample size is an odd number and
the average of the middle ordered data points if the the sample size is
even. 50% of the data is less than or equal to the median and 50% of the
data is greater than or equal to the median. For the Etruscan data, upon
ordering the data we get,

126 132 138 140 141 141 142 143 144 144 144 145 146 147 148 148 149 149 150 150 150 154 155 158 158

Since *n *is 25, (*n*+1)/2 is 13 and, hence, the median is the 13*th* order data point or 146 mm. We shall use *Q*_{2} to denote the median. So half of the Etruscans in the sample had a skull size less than or equal to 146mm and half of the Etruscans had a skull size greater than or equal to 146mm. The median is a **measure of center **. The median is very robust. Half the data would have to change for the median to change.

We now have the range of the data and a measure of the center. How about the middle 50%? This goes from the **First Quartile ** to **Third Quartile **. The **first quartile** is the median of the first half of the data. We will denote it by . 25% of the data is less than or equal to the first quartile and 75% of the data is greater than or equal to the first quartile. There are many rules for finding *Q*_{1}. In this class we will be using the computer for large data sets and the computer (the statistical software) will compute *Q*_{1}. For class and tests lets use a very simple rule. To find the ordered data point, divide n by 4. If the result is an integer use that integer to pick out the ordered data point corresponding to that integer. If the result is a fraction round up to the nearest integer. Pick out the ordered data point corresponding to this integer. For the Etruscan data, 25/4 is 6.25; hence, we round up to 7. The 7th ordered data point is 142, so *Q*_{1} = 142mm for the Etruscan data set. The first quartile is a robust statistic.

The **third quartile** is the median of the second half of the data. We will denote it by *Q*_{3} . 75% of the data is less than or equal to the third quartile and 25% of the data is greater than or equal to the third quartile. There are many rules for finding *Q*_{3}. To find *Q*_{3} by hand, just use the integer we found for the first quartile, but this time count through the data from the high measurements to the low measurements. Hence
*Q*_{3} = 150*mm* (several are tied at 150 but the 8th point from the top in my counting was 149). The third quartile is robust.

The difference between the quartiles is called the **interquartile range ** of the sample. It is denoted by **IQR **, so for the Etruscan data
*IQR* = 150-142 = 8*mm*. IQR is also a **measure of scale**. It is not sensitive to outliers (25% of the data have to be outliers to affect IQR); hence, IQR is robust.

In summary, for the Etruscan data, the five basic descriptive statistics are: 126, 142, 146, 150 and 158mm. We want to put these summary statistics in a picture but first we need the concept of an outlier, which we will do in the next section.

Lets do one more example which shows how we can get **very quickly** the 5 basic descriptive statistics from a stem leaf plot. Consider the subsample of Italian skull sizes given by,

133 128 136 140 127 136 131 131 128 132 125 133 134 136 134 129 132 139 143 138

The stem leaf plot is

Stem Leaves f F FTB 12 87859 5 5 13 31123442 8 13 13 66698 5 18 7 14 03 2 20 2We have added three columns on the right side of the stem-leaf plot. The column labeled

The sample size is 20 (last number in column *F*) which is even. So the median is the average of 10*th* (*n*/2), and the 11*th* ordered data points. To get these look at column *F*. There are 5 data points down through the end of the first class and there are 13 data points down through the end of the second class; hence, the median must occur in the second class. In the second class the 6*th* through 11*th* ordered data points are 131, 131, 132, 132, 133, 133. Thus the median is
.5(133 + 133) = 133.

Since 20/4 is 5, the first quartile is the 5*th* ordered data point which is 129 (The largest data point in the first class as dictated by column *F*). The third quartile is the 5*th* ordered data point from the top to the bottom. By the *FTB* column its in the second class from the top. The 7*th* ordered (from top to bottom) is 136, the 6*th*
is 136, and the 5*th* is 136. So *Q*_{3}=136.

For small data sets we can get these statistics by hand. But for large
data sets it is best if the computer gets them for us.

To run the class code for the descriptive statistics choose the summary module of the and choose **Numerical Summaries** after entering the data.

12 18 25 15 9 14 21 25 28 125