Download STA 517 – Introduction: Distribution and Inference

STA 517 – Introduction: Distribution and Inference 1 1.5 STATISTICAL INFERENCE FOR MULTINOMIAL PARAMETERS  Recall multi(n, =(1, 2, …, c))  Suppose that each of n independent, identical trials can have outcome in any of c categories. if trial i has outcome in category j = 0 otherwise  represents a multinomial trial, with  Let denote the number of trials having outcome in category j.  The counts distribution. Note: have the multinomial are random variables STA 517 – Introduction: Distribution and Inference 2 Example: Mendel’s theory  To test Mendel’s theories of natural inheritance. Mendel crossed pea plants of pure yellow strain with plants of pure green strain.  He predicted that second-generation hybrid seeds would be 75% yellow and 25% green, yellow being the dominant strain.  One experiment: produce n=8023 seeds, and observed n1=6022 yellow, n2=2001 green.  He want to test whether it follows 3:1 ratio. STA 517 – Introduction: Distribution and Inference 3 1.5.1 Estimation of Multinomial Parameters  To obtain MLE, the multinomial probability mass function is proportional to the kernel  The MLE are the {j} that maximize (1.14).  Log likelihood c 1 c 1 j 1 j 1 L( )   n j log  j   n j log  j  nc log( 1    j ) j  Differentiating L with respect to j gives the likelihood equation  ML solution satisfies STA 517 – Introduction: Distribution and Inference MLE  Now  Thus  MLE  The MLE are the sample proportions. 4 STA 517 – Introduction: Distribution and Inference 1.5.2 Pearson Statistic for Testing a Specified Multinomial  In 1900 the eminent British statistician Karl Pearson introduced a hypothesis test that was one of the first inferential methods.  It had a revolutionary impact on categorical data analysis, which had focused on describing associations.  Pearson’s test evaluates whether multinomial parameters equal certain specified values. 5 STA 517 – Introduction: Distribution and Inference Pearson Statistic  Consider  When H0 is true, the expected values of {nj}, called expected frequencies, are  Pearson proposed the test statistics  Greater difference values, for fixed n.  Let produce greater X2 denote the observed value of X2. The P-value is 6 STA 517 – Introduction: Distribution and Inference 1.5.3 Example: Testing Mendel’s Theories  n1=6022 yellow, n2=2001 green  MLE: ˆ1  6022 2001 ˆ  0.7506,  2   0.2494, 8023 8023  test whether it follows 3:1 ratio, i.e. H 0 : 1  10  0.75,  2   20  0.25  Expected frequencies are  This does not contradict Mendel’s hypothesis. 7 STA 517 – Introduction: Distribution and Inference SAS code data D; input outcome $ w; cards; yellow green 6022 2001 ; proc freq; weight w; table outcome/chisq TESTP=(0.25 0.75); run; 8 STA 517 – Introduction: Distribution and Inference 9 Pearson statistic  When c=2, it can be proved Pearson chi-square statistic is squared score statistic  PROOF: by Maple in matlab syms y n pi0 f=(y-n*pi0)^2/pi0+((n-y)-n*(1-pi0))^2/(1-pi0); f1=simple(f) %result: -(-y+pi0*n)^2/n/pi0/(-1+pi0)  How about c>2? STA 517 – Introduction: Distribution and Inference 10 1.5.5 Likelihood-Ratio Chi-Squared  An alternative test for multinomial parameters uses the likelihood-ratio test.  The kernel of the multinomial likelihood is  Under H0 the likelihood is maximized when  In the general case, it is maximized when  The ratio of the likelihoods equals  Thus, the likelihood-ratio statistic is STA 517 – Introduction: Distribution and Inference LR  In the general case, the parameter space consists of {j} subject to j=1, so the dimensionality is c-1. Under H0, the {j} are specified completely, so the dimension is 0. The difference in these dimensions equals c-1.  For large n, G2 has a chi-squared null distribution with df c-1. 11 STA 517 – Introduction: Distribution and Inference  Both chi-squared dist. With df=c-1  Asymptotically equivalent 12 STA 517 – Introduction: Distribution and Inference Wu, Ma, George (2007) 13 STA 517 – Introduction: Distribution and Inference 14 1.5.6 Testing with Estimated Expected Frequencies  Pearson’s chi-square was proposed for testing H0: j=j0, where j0 are fixed.  In some application, j0=j0() are function of a small set of unknown parameters .  ML estimates of  determine ML estimates {j0=j0()} and hence ML estimates expected frequencies in X2.  Replacing of X2. by estimates  the true df=(c-1)-dim() of of affects the distribution STA 517 – Introduction: Distribution and Inference 15 Example  A sample of 156 dairy calves born in Okeechobee County, Florida, were classified according to whether they caught pneumonia within 60 days of birth.  Calves that got a pneumonia infection were also classified according to whether they got a secondary infection within 2 weeks after the first infection cleared up.  Hypothesis: the primary infection had an immunizing effect that reduced the likelihood of a secondary infection.  How to test it? STA 517 – Introduction: Distribution and Inference 16 Data structure  Calves that did not get a primary infection could not get a secondary infection, so no observations can fall in the category for ‘‘no’’ primary infection and ‘‘yes’’ secondary infection.  That combination is called a structural zero. STA 517 – Introduction: Distribution and Inference 17 Test: whether the probability of primary infection was the same as the conditional probability of secondary infection, given that the calf got the primary infection.  ab denotes the probability that a calf is classified in row a and column b of this table, the null hypothesis is  Let =11+12 denote the probability of primary infection. Then hypothesis probability is STA 517 – Introduction: Distribution and Inference 18 MLE and chi-squared test  Likelihood  Log likelihood  Differentiation with respect to   Solution  For the example  Expected counts for each cell Conclusion: the primary infection had an immunizing effect that reduced the likelihood of a secondary infection. STA 517 – Introduction: Distribution and Inference 19 Standard Error  Since  the information is its expected value, which is  which simplifies to  The asymptotic standard error is the square root of the inverse information, or STA 517 – Introduction: Distribution and Inference How about confidence limits? 20 STA 517 – Introduction: Distribution and Inference SAS code - MLE, test for binomial proc IML; y=842; n=1824;pi0=0.5; /*data*/ pihat=y/n; SE=sqrt(pihat*(1-pihat)/n); /*MLE*/ WaldStat=(pihat-pi0)**2/SE**2; pWald=1-CDF('CHISQUARE', WaldStat, 1); LR=2*(y*log(pihat/(pi0)) +(n-y)*log((1-pihat)/(1-pi0))); pLR=1-CDF('CHISQUARE',LR, 1); ScoreStat=(pihat-pi0)**2/(pi0*(1-pi0)/n); pScore=1-CDF('CHISQUARE',ScoreStat, 1); print WaldStat pWald; print LR pLR; print ScoreStat pScore; 21 STA 517 – Introduction: Distribution and Inference SAS code - MLE, test for binomial data D; input outcome $ w; cards; Yes 842 No 982 ; proc freq; weight w; table outcome/all CL BINOMIAL(P=0.5 LEVEL="Yes"); exact binomial; run; 22 STA 517 – Introduction: Distribution and Inference SAS code – multinomial data D; input outcome $ w; cards; yellow green 6022 2001 ; proc freq; weight w; table outcome/chisq TESTP=(0.25 0.75); run; 23

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download STA 517 – Introduction: Distribution and Inference