next up previous contents index
Next: Box-and-Whisker Plot Up: Summarizing Numerical Data Previous: Stem-and-Leaf Plot

Relative Frequency Table and Histogram

In Section  1.3.1, we presented a relative frequency table  for categorical data. Relative frequency tables may also be used for numerical data. A relative frequency table for HS GPA is presented below. The class width  is chosen so as to achieve a moderate number of class intervals.



       


                Relative Frequency Table of High School GPA
          
                    HS GPA    Frequency    Rel. Freq. 
                    2.2-2.39      1          .02
                    2.4-2.59      6          .16
                    2.6-2.79     13          .18
                    2.8-2.99      7          .13
                    3.0-3.19      4          .09
                    3.2-3.39      9          .14
                    3.4-3.59      6          .13
                    3.6-3.79      7          .11
                    3.8-4.00      3          .05
                    --------     --         ----
                     Total       56         1.01



There are three things to keep in mind when constructing a frequency table:

1. How many classes (i.e. intervals) do you want? This also determines the class width .
2. Where does the first interval start?
3. How do you avoid boundary disputes?
The last item requires you to choose a boundary convention . For instance, the intervals in the the frequency table above could have been written as follows:



                     2.2-2.4      
                     2.4-2.6    
                     2.6-2.8    
                       etc



This is much simpler to read, isn't it? However, does a grade of 2.4 count in the first or second interval? One remedy is to include a footnote to the table that says something like ``The intervals include the left endpoint but not the right''. This way, the grade of 2.4 belongs to the second (not the first) interval. Or, square braces and parenthesis may be used as follows: [2.2-2.4), [2.4-2.6),$\ldots$, [3.8-4.0]. This also means that the class intervals include the left endpoint but not the right (except for the last class). The frequency table above chose to avoid the boundary problem altogether by making each interval end before the start of the next one. This is also effective, but the frequency table can look more complicated. Which method do you prefer?

The relative frequency table is a compact numerical way to present how the data is distributed. If the frequencies are plotted as columns, the resulting plot is called a histogram .

\epsfig{file=chokhsgpa.ps, height=4in, angle=-90}

The histogram and stem-and-leaf plots look alike, except that the stem-and-leaf plot has columns that go sideways instead of upwards. Stem-and-leafs are better if you want the data values themselves available from the plot. However, the stem-and-leaf is impractical for large sample sizes. The frequency table and histogram can handle large sample sizes easily. Furthermore, they allow more flexibility in choosing class widths. For example, you may choose class widths of .15, as follows: [2.35-2.50), [2.50-2.65), [2.65-2.80),....., [3.85, 4.00].


next up previous contents index
Next: Box-and-Whisker Plot Up: Summarizing Numerical Data Previous: Stem-and-Leaf Plot

2003-09-08