Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Log-linear Models Please read Chapter Two We are interested in relationships between variables White Victim White Prisoner 151 Black Victim 9 (151/160=0.94) Black Prisoner 63 (63/166=0.40) 103 Pearson Chi-square test of Independence Based on P(A,B) = P(A) P(B) p11 p12 p13 p14 p1+ p21 p22 p23 p24 p2+ p+1 p+2 p+3 p+4 p++ = 1 Under H0 of independence, pij = pi+ p+j x11 x12 x13 x14 x1+ x21 x22 x23 x24 x2+ x+1 x+2 x+3 x+4 x++ = N Computing the Pearson chisquare test of independence • Calculate (estimated) expected frequencies • Calculate • For large samples, has an approximate Chisquare distribution if H0 is true • Degrees of freedom (I-1)(J-1) Numerical example of Pearson chisquare White Prisoner Black Prisoner Total White Victim Black Victim 151 (105) 63 (109) 214 9 (55) 103 (57) 112 Total 160 166 326 With R Conclusions • • • • X2 = 115, df = (2-1) (2-1) = 1 Critical value at alpha = 0.05 is 3.84 Reject H0 Conclude race of prisoner and race of victim are not independent. • That’s not good enough! Murder victims and the persons convicted of murdering them tend to be of the same race. (Say what happened!) Two treatments for Kidney Stones Treatment A Treatment B Effective 273 289 Ineffective 77 61 X2 = 2.3106, df = 1, p = 0.1285 These results are consistent with no difference in effectiveness between treatments. All this applies to the multinomial, but there are 3 main sampling models • Multinomial • Poisson • Product Multinomial Fortunately, the same statistical methods work with all. Poisson • Independent Poisson processes generate the counts in each category (for ex., traffic accidents). • In homework you proved that conditionally upon the total number of events, the joint distribution of the counts is multinomial. • Justifies use of multinomial theory • But in hard cases, Poisson probability calculations can be easier. Product multinomial • Take independent random samples of sizes N1, N2, …, NI from I sub-populations. • In each, observe a multinomial with J categories. Compare. • Examples: Vitamin C study, Kidney stone study. • Likelihood: A product of I multinomial likelihoods, because of independent sampling from sub-populations. • This is almost always the right model for experimental studies. Suppose the null hypothesis is no differences among the I vectors of multinomial probabilities x11 x12 x13 x14 x1+ = N1 x21 x22 x23 x24 x2+ = N2 x+1 x+2 x+3 x+4 x++ = N • Then under H0, the MLE of the (common) pj is the sample proportion, pooling data across the I rows: x+j/N. • And the expected cell frequency is Same as for the usual chisquare test of independence So let’s concentrate on the multinomial Assume a multinomial and test independence? Messy! p1 p2 p1+p2 p3 p4 p3+p4 p1+p3 p2+p4 Log-linear models • Linear model for the (natural) logs of the expected frequencies • Looks like ANOVA notation (STA332) • First, one-factor (not in the text) • Then two-factor (in the text) • Start with the familiar normal example, testing for differences among means. Compare 3 means • Grand Mean • Effects are deviations from the grand mean – – – • Single categorical variable, k categories Linear model for log of expected frequencies No probability can equal zero! This is a Re-Parameterization Substitute into likelihood function and do maximum likelihood How many parameters, k or k-1? There are still k-1 parameters • • • All “effects” zero corresponds to equal probabilities Maximum Likelihood Log Likelihood k = 3 Categories Numerical MLE Remember the employment study? • 106 Employed in a job related to field of study • 74 Employed in a job unrelated to their field of study • 20 Unemployed • Use R to – Estimate the effects – Test equal probabilities (senseless) Generic MLE with R Estimate the probabilities and test This seems like a lot of trouble just to estimate some probabilities and test if they are equal. But the payoff comes for tables of two or more dimensions.