OPTIONS NODATE PAGENO=1 LS=75 PS=55; FILENAME tmpfn1 URL "http://www.stat.wmich.edu/naranjo/stat6620/diseaseoutbreak.txt"; TITLE 'Logistic Regression'; DATA one; INFILE tmpfn1; INPUT disease age sestatus $ sector $ x_semid x_selow x_sector; RUN; PROC PRINT DATA=one (obs=40); RUN; /**** Multiple logistic regression using indicators */ PROC LOGISTIC DATA = one; MODEL disease (event='1') = age x_selow x_semid x_sector ; OUTPUT OUT=lognew P=pred lower=lcl upper=ucl ; RUN; /**** Multiple logistic regression using CLASS statement*/ PROC LOGISTIC DATA = one; CLASS sestatus sector / param=ref; /* Reference Cell */ MODEL disease (event='1') = age sestatus sector; RUN; PROC LOGISTIC DATA = one; CLASS sestatus sector; /* Default: factor effects */ MODEL disease (event='1') = age sestatus sector; RUN; /**************************** Logistic Regression versus 2x2 Chisquare */ PROC LOGISTIC DATA = one; CLASS sector; MODEL disease (event='1') = sector; RUN; PROC FREQ DATA=one; TABLE disease*sector / chisq; RUN; /**************************** Logistic Regression versus 2x3 Chisquare */ PROC LOGISTIC DATA = one; CLASS sestatus; MODEL disease (event='1') = sestatus; RUN; PROC FREQ DATA=one; TABLE disease*sestatus / chisq; RUN; QUIT; ------------------------------------------------------------------------- Logistic Regression 1 Obs disease age sestatus sector x_semid x_selow x_sector 1 0 33 Upper Urban 0 0 0 2 0 35 Upper Urban 0 0 0 3 0 6 Upper Urban 0 0 0 4 0 60 Upper Urban 0 0 0 5 1 18 Lower Urban 0 1 0 6 0 26 Lower Urban 0 1 0 7 0 6 Lower Urban 0 1 0 8 1 31 Middle Urban 1 0 0 9 1 26 Middle Urban 1 0 0 10 0 37 Middle Urban 1 0 0 11 0 23 Upper Urban 0 0 0 12 0 23 Upper Urban 0 0 0 13 0 27 Upper Urban 0 0 0 14 1 9 Upper Urban 0 0 0 15 1 37 Upper Rural 0 0 1 16 1 22 Upper Rural 0 0 1 17 1 67 Upper Rural 0 0 1 18 0 8 Upper Rural 0 0 1 19 1 6 Upper Rural 0 0 1 20 1 15 Upper Rural 0 0 1 21 1 21 Middle Rural 1 0 1 22 1 32 Middle Rural 1 0 1 23 1 16 Upper Rural 0 0 1 24 0 11 Middle Rural 1 0 1 25 0 14 Lower Rural 0 1 1 26 0 9 Middle Rural 1 0 1 27 0 18 Middle Rural 1 0 1 28 0 2 Lower Urban 0 1 0 29 0 61 Lower Urban 0 1 0 30 0 20 Lower Urban 0 1 0 31 0 16 Lower Urban 0 1 0 32 0 9 Middle Urban 1 0 0 33 0 35 Middle Urban 1 0 0 34 0 4 Upper Urban 0 0 0 35 0 44 Lower Rural 0 1 1 36 1 11 Lower Rural 0 1 1 37 0 3 Middle Rural 1 0 1 38 0 6 Lower Rural 0 1 1 39 1 17 Middle Rural 1 0 1 40 0 1 Lower Rural 0 1 1 Logistic Regression 2 The LOGISTIC Procedure Model Information Data Set WORK.ONE Response Variable disease Number of Response Levels 2 Model binary logit Optimization Technique Fisher's scoring Number of Observations Read 98 Number of Observations Used 98 Response Profile Ordered Total Value disease Frequency 1 0 67 2 1 31 Probability modeled is disease=1. Model Convergence Status Convergence criterion (GCONV=1E-8) satisfied. Model Fit Statistics Intercept Intercept and Criterion Only Covariates AIC 124.318 111.054 SC 126.903 123.979 -2 Log L 122.318 101.054 Testing Global Null Hypothesis: BETA=0 Test Chi-Square DF Pr > ChiSq Likelihood Ratio 21.2635 4 0.0003 Score 20.4067 4 0.0004 Wald 16.6437 4 0.0023 Logistic Regression 3 The LOGISTIC Procedure Analysis of Maximum Likelihood Estimates Standard Wald Parameter DF Estimate Error Chi-Square Pr > ChiSq Intercept 1 -2.3127 0.6426 12.9545 0.0003 age 1 0.0297 0.0135 4.8535 0.0276 x_selow 1 -0.3051 0.6041 0.2551 0.6135 x_semid 1 0.4088 0.5990 0.4657 0.4950 x_sector 1 1.5746 0.5016 9.8543 0.0017 Odds Ratio Estimates Point 95% Wald Effect Estimate Confidence Limits age 1.030 1.003 1.058 x_selow 0.737 0.226 2.408 x_semid 1.505 0.465 4.868 x_sector 4.829 1.807 12.907 Association of Predicted Probabilities and Observed Responses Percent Concordant 77.5 Somers' D 0.554 Percent Discordant 22.1 Gamma 0.556 Percent Tied 0.3 Tau-a 0.242 Pairs 2077 c 0.777 Logistic Regression 4 The LOGISTIC Procedure Model Information Data Set WORK.ONE Response Variable disease Number of Response Levels 2 Model binary logit Optimization Technique Fisher's scoring Number of Observations Read 98 Number of Observations Used 98 Response Profile Ordered Total Value disease Frequency 1 0 67 2 1 31 Probability modeled is disease=1. Class Level Information Design Class Value Variables sestatus Lower 1 0 Middle 0 1 Upper 0 0 sector Rural 1 Urban 0 Model Convergence Status Convergence criterion (GCONV=1E-8) satisfied. Logistic Regression 5 The LOGISTIC Procedure Model Fit Statistics Intercept Intercept and Criterion Only Covariates AIC 124.318 111.054 SC 126.903 123.979 -2 Log L 122.318 101.054 Testing Global Null Hypothesis: BETA=0 Test Chi-Square DF Pr > ChiSq Likelihood Ratio 21.2635 4 0.0003 Score 20.4067 4 0.0004 Wald 16.6437 4 0.0023 Type 3 Analysis of Effects Wald Effect DF Chi-Square Pr > ChiSq age 1 4.8535 0.0276 sestatus 2 1.2053 0.5474 sector 1 9.8543 0.0017 Analysis of Maximum Likelihood Estimates Standard Wald Parameter DF Estimate Error Chi-Square Pr > ChiSq Intercept 1 -2.3127 0.6426 12.9545 0.0003 age 1 0.0297 0.0135 4.8535 0.0276 sestatus Lower 1 -0.3051 0.6041 0.2551 0.6135 sestatus Middle 1 0.4088 0.5990 0.4657 0.4950 sector Rural 1 1.5746 0.5016 9.8543 0.0017 Odds Ratio Estimates Point 95% Wald Effect Estimate Confidence Limits age 1.030 1.003 1.058 sestatus Lower vs Upper 0.737 0.226 2.408 sestatus Middle vs Upper 1.505 0.465 4.868 sector Rural vs Urban 4.829 1.807 12.907 Logistic Regression 6 The LOGISTIC Procedure Association of Predicted Probabilities and Observed Responses Percent Concordant 77.5 Somers' D 0.554 Percent Discordant 22.1 Gamma 0.556 Percent Tied 0.3 Tau-a 0.242 Pairs 2077 c 0.777 Logistic Regression 7 The LOGISTIC Procedure Model Information Data Set WORK.ONE Response Variable disease Number of Response Levels 2 Model binary logit Optimization Technique Fisher's scoring Number of Observations Read 98 Number of Observations Used 98 Response Profile Ordered Total Value disease Frequency 1 0 67 2 1 31 Probability modeled is disease=1. Class Level Information Design Class Value Variables sestatus Lower 1 0 Middle 0 1 Upper -1 -1 sector Rural 1 Urban -1 Model Convergence Status Convergence criterion (GCONV=1E-8) satisfied. Logistic Regression 8 The LOGISTIC Procedure Model Fit Statistics Intercept Intercept and Criterion Only Covariates AIC 124.318 111.054 SC 126.903 123.979 -2 Log L 122.318 101.054 Testing Global Null Hypothesis: BETA=0 Test Chi-Square DF Pr > ChiSq Likelihood Ratio 21.2635 4 0.0003 Score 20.4067 4 0.0004 Wald 16.6437 4 0.0023 Type 3 Analysis of Effects Wald Effect DF Chi-Square Pr > ChiSq age 1 4.8535 0.0276 sestatus 2 1.2053 0.5474 sector 1 9.8543 0.0017 Analysis of Maximum Likelihood Estimates Standard Wald Parameter DF Estimate Error Chi-Square Pr > ChiSq Intercept 1 -1.4909 0.4411 11.4215 0.0007 age 1 0.0297 0.0135 4.8535 0.0276 sestatus Lower 1 -0.3397 0.3690 0.8471 0.3574 sestatus Middle 1 0.3742 0.3662 1.0439 0.3069 sector Rural 1 0.7873 0.2508 9.8543 0.0017 Odds Ratio Estimates Point 95% Wald Effect Estimate Confidence Limits age 1.030 1.003 1.058 sestatus Lower vs Upper 0.737 0.226 2.408 sestatus Middle vs Upper 1.505 0.465 4.868 sector Rural vs Urban 4.829 1.807 12.907 Logistic Regression 9 The LOGISTIC Procedure Association of Predicted Probabilities and Observed Responses Percent Concordant 77.5 Somers' D 0.554 Percent Discordant 22.1 Gamma 0.556 Percent Tied 0.3 Tau-a 0.242 Pairs 2077 c 0.777 Logistic Regression 10 The LOGISTIC Procedure Model Information Data Set WORK.ONE Response Variable disease Number of Response Levels 2 Model binary logit Optimization Technique Fisher's scoring Number of Observations Read 98 Number of Observations Used 98 Response Profile Ordered Total Value disease Frequency 1 0 67 2 1 31 Probability modeled is disease=1. Class Level Information Design Class Value Variables sector Rural 1 Urban -1 Model Convergence Status Convergence criterion (GCONV=1E-8) satisfied. Model Fit Statistics Intercept Intercept and Criterion Only Covariates AIC 124.318 111.534 SC 126.903 116.704 -2 Log L 122.318 107.534 Logistic Regression 11 The LOGISTIC Procedure Testing Global Null Hypothesis: BETA=0 Test Chi-Square DF Pr > ChiSq Likelihood Ratio 14.7838 1 0.0001 Score 14.7805 1 0.0001 Wald 13.5939 1 0.0002 Type 3 Analysis of Effects Wald Effect DF Chi-Square Pr > ChiSq sector 1 13.5939 0.0002 Analysis of Maximum Likelihood Estimates Standard Wald Parameter DF Estimate Error Chi-Square Pr > ChiSq Intercept 1 -0.7175 0.2364 9.2111 0.0024 sector Rural 1 0.8717 0.2364 13.5939 0.0002 Odds Ratio Estimates Point 95% Wald Effect Estimate Confidence Limits sector Rural vs Urban 5.717 2.263 14.442 Association of Predicted Probabilities and Observed Responses Percent Concordant 49.5 Somers' D 0.409 Percent Discordant 8.7 Gamma 0.702 Percent Tied 41.8 Tau-a 0.179 Pairs 2077 c 0.704 Logistic Regression 12 The FREQ Procedure Table of disease by sector disease sector Frequency‚ Percent ‚ Row Pct ‚ Col Pct ‚Rural ‚Urban ‚ Total ---------------------------- 0 ‚ 18 ‚ 49 ‚ 67 ‚ 18.37 ‚ 50.00 ‚ 68.37 ‚ 26.87 ‚ 73.13 ‚ ‚ 46.15 ‚ 83.05 ‚ ---------------------------- 1 ‚ 21 ‚ 10 ‚ 31 ‚ 21.43 ‚ 10.20 ‚ 31.63 ‚ 67.74 ‚ 32.26 ‚ ‚ 53.85 ‚ 16.95 ‚ ---------------------------- Total 39 59 98 39.80 60.20 100.00 Statistics for Table of disease by sector Statistic DF Value Prob ------------------------------------------------------ Chi-Square 1 14.7805 0.0001 Likelihood Ratio Chi-Square 1 14.7838 0.0001 Continuity Adj. Chi-Square 1 13.1236 0.0003 Mantel-Haenszel Chi-Square 1 14.6297 0.0001 Phi Coefficient -0.3884 Contingency Coefficient 0.3620 Cramer's V -0.3884 Fisher's Exact Test ---------------------------------- Cell (1,1) Frequency (F) 18 Left-sided Pr <= F 0.0001 Right-sided Pr >= F 1.0000 Table Probability (P) 0.0001 Two-sided Pr <= P 0.0002 Sample Size = 98 Logistic Regression 13 The LOGISTIC Procedure Model Information Data Set WORK.ONE Response Variable disease Number of Response Levels 2 Model binary logit Optimization Technique Fisher's scoring Number of Observations Read 98 Number of Observations Used 98 Response Profile Ordered Total Value disease Frequency 1 0 67 2 1 31 Probability modeled is disease=1. Class Level Information Design Class Value Variables sestatus Lower 1 0 Middle 0 1 Upper -1 -1 Model Convergence Status Convergence criterion (GCONV=1E-8) satisfied. Model Fit Statistics Intercept Intercept and Criterion Only Covariates AIC 124.318 124.085 SC 126.903 131.840 -2 Log L 122.318 118.085 Logistic Regression 14 The LOGISTIC Procedure Testing Global Null Hypothesis: BETA=0 Test Chi-Square DF Pr > ChiSq Likelihood Ratio 4.2325 2 0.1205 Score 4.0670 2 0.1309 Wald 3.9211 2 0.1408 Type 3 Analysis of Effects Wald Effect DF Chi-Square Pr > ChiSq sestatus 2 3.9211 0.1408 Analysis of Maximum Likelihood Estimates Standard Wald Parameter DF Estimate Error Chi-Square Pr > ChiSq Intercept 1 -0.7656 0.2265 11.4223 0.0007 sestatus Lower 1 -0.6558 0.3323 3.8941 0.0485 sestatus Middle 1 0.4291 0.3293 1.6980 0.1926 Odds Ratio Estimates Point 95% Wald Effect Estimate Confidence Limits sestatus Lower vs Upper 0.414 0.144 1.190 sestatus Middle vs Upper 1.224 0.430 3.483 Association of Predicted Probabilities and Observed Responses Percent Concordant 45.1 Somers' D 0.228 Percent Discordant 22.2 Gamma 0.339 Percent Tied 32.7 Tau-a 0.100 Pairs 2077 c 0.614 Logistic Regression 15 The FREQ Procedure Table of disease by sestatus disease sestatus Frequency‚ Percent ‚ Row Pct ‚ Col Pct ‚Lower ‚Middle ‚Upper ‚ Total ------------------------------------- 0 ‚ 29 ‚ 14 ‚ 24 ‚ 67 ‚ 29.59 ‚ 14.29 ‚ 24.49 ‚ 68.37 ‚ 43.28 ‚ 20.90 ‚ 35.82 ‚ ‚ 80.56 ‚ 58.33 ‚ 63.16 ‚ ------------------------------------- 1 ‚ 7 ‚ 10 ‚ 14 ‚ 31 ‚ 7.14 ‚ 10.20 ‚ 14.29 ‚ 31.63 ‚ 22.58 ‚ 32.26 ‚ 45.16 ‚ ‚ 19.44 ‚ 41.67 ‚ 36.84 ‚ ------------------------------------- Total 36 24 38 98 36.73 24.49 38.78 100.00 Statistics for Table of disease by sestatus Statistic DF Value Prob ------------------------------------------------------ Chi-Square 2 4.0670 0.1309 Likelihood Ratio Chi-Square 2 4.2325 0.1205 Mantel-Haenszel Chi-Square 1 2.5089 0.1132 Phi Coefficient 0.2037 Contingency Coefficient 0.1996 Cramer's V 0.2037 Sample Size = 98