# Chi-square Tests

1. test for the difference in two proportions ( contingency tables)

2. test for differences in c proportions ( contingency tables)

3. test of independence ( contingency tables)

1.
Consider the following exercise.
In an effort to compare the efficacy of two medical approaches to removing plaque that clogs arteries, Dr. Eric J. Topol and colleagues conducted a study in which they randomly assigned 1,012 heart patients to have either directional coronary atherectomy or balloon angioplasty.   Of the 512 patients given the atherectomy, 44 died or suffered heart attacks within 6 months of treatment.   Of the 500 patients given angioplasty, 23 died or suffered heart attacks within 6 months of treatment.   At the .01 level of significance, is there evidence of a significant difference in the two medical approaches with respect to the proportion of deaths or heart attacks within 6 months of treatment?
They didn't bother to supply us with a contingency table here, so we'll make our own.   In the table, let us make the rows represent the two medical approaches, and the columns represent those who died or suffered heart attacks within 6 months of treatment, and those who did not.

 Died or suffered a heart attack Did not die or suffer a heart attack Directional Atherectomy 44 468 Balloon Angioplasty 23 477

The hypotheses for this problem are: .
Now we need to determine the expected frequencies under H0.   For this, we will need the row, column and overall totals.   Let us add these to our table.

 Died or suffered a heart attack Did not die or suffer a heart attack Directional Atherectomy 44 468 512 Balloon Angioplasty 23 477 500 67 945 1012

Starting in the upper left, we have .   Then we will have , , and .   It is convenient to put these values into our table.

 Died or suffered a heart attack Did not die or suffer a heart attack Directional Atherectomy 44 (33.9) 468 (478.1) Balloon Angioplasty 23 (33.1) 477 (466.9)

Now we can get the test statistic:

3.009 + 3.082 + 0.213 + 0.218 = 6.522

Set up the rejection region by drawing a curve (just draw a somewhat right-skewed curve) and shade the last 1% of the right tail.   We need the critical value associated with this area.   The degrees of freedom for this problem are , so we have .   Note that tests will always have two-tailed alternative hypotheses, but we will not split up into both tails of the curve when setting up a rejection region.   curves are right-skewed, and the rejection region is always in the right tail.   We get this critical value with the EQUATION SOLVER, just like we did for t critical values.   Under the MATH menu, choose Solver.

You may just have to change tcdf to cdf if you used the EQUATION SOLVER last to get a t critical value.   If not, enter in variables for the arguments like you see above; L is for Lower bound, U is for Upper bound, D is for Degrees of freedom, and A is for Area.
We want the value such that there is 1% of the area to the right of that value.   Let us have the calculator solve for U, so enter zero on that line for a "guess."   curves start at zero (see page 130 of your textbook), so enter 0 for L, the lower bound.   We have 1 degree of freedom, and the area between the lower and upper bound should be .99.   With the cursor on the U=0 line, press SOLVE (ALPHA ENTER).   This calculation will take about 15 seconds.

We now see that .
It comes close, but the test statistic does not fall into the rejection region, i.e.,
6.522 6.635, therefore we do not reject H0.   No, there is no evidence of a significant difference in the two medical approaches with respect to the proportion of deaths or heart attacks within 6 months of treatment.

Now let us get the p-value associated with this problem.   This is .   We will use the cdf function to get this probability, like you see below.

So we have .   Do not reject H0.

Now let us re-do the problem using the -Test function.   Under STAT TESTS, choose -Test.

By default, the calculator expects the contingency table with the observed values to be in matrix A.   Go to [2nd] MATRX EDIT.   Select A.   Define the dimension to be 2 x2.   Enter in the observed frequencies from our contingency table, replacing the default zeros.

You need not do anything for B, the Expected frequencies matrix.   Go back to STAT TESTS, and choose -Test.   With the cursor on Calculate, press ENTER.

This gives the same results as before.   The matrix of expected frequencies can be viewed, if we want to check on our calculations from before.   Go back to [2nd] MATRX EDIT.   Select B and press ENTER.

Indeed, these are the same expected values that we got by hand.

Another useful option is Draw.   Go back and choose Draw instead of Calculate.

A curve with 1 degree of freedom is extremely right-skewed.   Also, the test statistic is too large to show up on the graph, even though it's not in the rejection region!   The critical value is off-screen to the right as well.
Now let us use ExcelTools to do this problem.
Down load the 2x2 of the x-sq functions.   Enter .01 for the Level of Significance, and an Output title if you wish. Click OK.
Name the row and column headings. Enter in the observed frequencies into the table, replacing the default zeros.
Note how Excel calculates everything as you go.   Once the last observed value is entered, the test is done.   A table of expected frequencies is given, along with the test statistic, p-value, etc.

The results are the same as before.

2.
Consider the following exercise.
The marketing director of a cable television company is interested in determining whether there is a difference in the proportion of households that adopt a cable television service based on the type of residence (single-family dwelling, two- to four-family dwelling, and apartment house).   A random sample of 400 households revealed the following:

 Purchase Cable Television? Single-Family Two- to Four-Family Apartment House Total Yes 94 39 77 210 No 56 36 98 190 Total 150 75 175 400

At the .01 level of significance, is there evidence of a significant difference among types of residence with respect to the proportion of households that adopt the cable TV service?

The hypotheses for this problem are:

H0: p1 = p2 = p3
H1: Not all pj are equal (where j = 1, 2, 3).

Let us determine the expected frequencies under H0. Starting in the upper left, we have .   Then we will have , , and , and , and .   Again, let us put these values into the table.

 Purchase Cable Television? Single-Family Two- to Four-Family Apartment House Total Yes 94 (78.75) 39 (39.375) 77 (91.875) 210 No 56 (71.25) 36 (35.625) 98 (83.125) 190 Total 150 75 175 400

Now we can get the test statistic:

2.953 + 0.004 + 2.408 + 3.264 + 0.011 + 2.662 = 11.302

Let us set up the rejection region.   This problem also uses .   Draw a curve and shade the last 1% of the right tail.   We need the critical value associated with this area.   The degrees of freedom for this problem are , so we have .   Get this critical value with the EQUATION SOLVER, just like we did for the first problem.

We now see that .
The test statistic falls into the rejection region, i.e., 11.302 > 9.210, therefore we reject H0.   Yes, there is evidence of a significant difference among types of residence with respect to the proportion of households that adopt the cable TV service.

Now let us get the p-value associated with this problem.   This is .

So we have .   Reject H0.

Now let us re-do the problem using the -Test function.   We will have to change the matrix A (or use a different matrix).   Change the dimension to 2 x3.   Enter in the observed frequencies from the contingency table.   Leave B as is.   With the cursor on Calculate, press ENTER.

This gives the same results as before.   Let us check if the matrix of expected frequencies matches our calculations from before.

Now go back and choose Draw.

A curve with 2 degree of freedom is also very right-skewed.   And again, the test statistic is too large to show up on the graph.

Now let us use ExcelTools to do this problem.
Down load the 3x3 of the x-sq functions.
Name the row and column headings.   Enter in the observed frequencies into the table.

The results are the same as before.

3.
Consider the following exercise.
The victory of the incumbent, Bill Clinton, in the 1996 presidential election was attributed to improved economic conditions and low unemployment.   Suppose a survey of 800 adults taken soon after the election resulted in the following cross-classification of financial condition with education level:

 Financial Condition H.S. Degree or Lower Some College College Degree or Higher Total Worse off now than before 91 39 18 148 No difference 104 73 31 208 Better off now than before 235 48 161 444 Total 430 160 210 800

At the .05 level of significance, is there evidence of a relationship between financial condition and education level?

The hypotheses for this problem are:

H0: Financial condition and Education level are independent
H1: Financial condition and Education level are dependent

This problem would be fairly easy to do by hand, but since we've already done two that way, let us skip it and go straight to using the -Test function. Change the matrix A to have dimension 3 x3.   Enter in the observed frequencies from the contingency table.
Again, leave B as is.

We now see that (!)
and the p-value is approximately zero, certainly less than .
We will most definitely reject H0, and conclude that there is indeed a relationship between Financial condition and Education level.   The two variables are not independent!

Let us look at the matrix of expected frequencies, since they must disagree with the observed frequencies quite a bit.

Let us add these to the table:

 Financial Condition H.S. Degree or Lower Some College College Degree or Higher Total Worse off now than before 91 (79.55) 39 (29.6) 18 (38.85) 148 No difference 104 (111.8) 73 (41.6) 31 (54.6) 208 Better off now than before 235 (238.65) 48 (88.8) 161 (116.55) 444 Total 430 160 210 800

Many of these expected frequencies do indeed differ greatly from the observed frequencies, especially in the second and third columns.

Now go back and choose Draw, if only to see what a curve with 4 degrees of freedom looks like.   (Right-skewed, of course!)   For this problem, we have .

Once again, the test statistic is way too large to show up on the graph.

Lastly, let us use ExcelTools to do this problem.
This time enter .05 for the Level of Significance, enter 3 for the Number of Rows, and enter 3 for the Number of Columns.

The results are the same as before. As noted on the output, a "bug" prevents Excel from printing the Chi-square test statistic. It's too big, apparently. Luckily for us, the TI-83 is not bothered by this! (Nor would hand calculations be unable to get such a test statistic.)