Download STA 517 – Introduction: Distribution and Inference

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Statistics wikipedia , lookup

History of statistics wikipedia , lookup

Foundations of statistics wikipedia , lookup

Transcript
STA 517 – Introduction: Distribution and Inference
1
1.5 STATISTICAL INFERENCE FOR
MULTINOMIAL PARAMETERS
 Recall multi(n, =(1, 2, …, c))
 Suppose that each of n independent, identical trials can
have outcome in any of c categories.
if trial i has outcome in category j
= 0 otherwise

represents a multinomial trial, with
 Let
denote the number of trials having
outcome in category j.
 The counts
distribution.
Note:
have the multinomial
are random variables
STA 517 – Introduction: Distribution and Inference
2
Example: Mendel’s theory
 To test Mendel’s theories of natural inheritance. Mendel
crossed pea plants of pure yellow strain with plants of
pure green strain.
 He predicted that second-generation hybrid seeds
would be 75% yellow and 25% green, yellow being the
dominant strain.
 One experiment: produce n=8023 seeds, and observed
n1=6022 yellow, n2=2001 green.
 He want to test whether it follows 3:1 ratio.
STA 517 – Introduction: Distribution and Inference
3
1.5.1 Estimation of Multinomial
Parameters
 To obtain MLE, the multinomial probability mass
function is proportional to the kernel
 The MLE are the {j} that maximize (1.14).
 Log likelihood
c 1
c 1
j 1
j 1
L( )   n j log  j   n j log  j  nc log( 1    j )
j
 Differentiating L with respect to j gives the likelihood
equation
 ML solution satisfies
STA 517 – Introduction: Distribution and Inference
MLE
 Now
 Thus
 MLE
 The MLE are the sample proportions.
4
STA 517 – Introduction: Distribution and Inference
1.5.2 Pearson Statistic for Testing a
Specified Multinomial
 In 1900 the eminent British statistician Karl Pearson
introduced a hypothesis test that was one of the first
inferential methods.
 It had a revolutionary impact on categorical data
analysis, which had focused on describing associations.
 Pearson’s test evaluates whether multinomial
parameters equal certain specified values.
5
STA 517 – Introduction: Distribution and Inference
Pearson Statistic
 Consider
 When H0 is true, the expected values of {nj}, called
expected frequencies, are
 Pearson proposed the test statistics
 Greater difference
values, for fixed n.
 Let
produce greater X2
denote the observed value of X2. The P-value is
6
STA 517 – Introduction: Distribution and Inference
1.5.3 Example: Testing Mendel’s
Theories
 n1=6022 yellow, n2=2001 green
 MLE:
ˆ1 
6022
2001
ˆ
 0.7506,  2 
 0.2494,
8023
8023
 test whether it follows 3:1 ratio, i.e.
H 0 : 1  10  0.75,  2   20  0.25
 Expected frequencies are
 This does not contradict Mendel’s hypothesis.
7
STA 517 – Introduction: Distribution and Inference
SAS code
data D;
input outcome $ w;
cards;
yellow
green
6022
2001
;
proc freq; weight w;
table outcome/chisq TESTP=(0.25 0.75);
run;
8
STA 517 – Introduction: Distribution and Inference
9
Pearson statistic
 When c=2, it can be proved Pearson chi-square statistic
is squared score statistic
 PROOF: by Maple in matlab
syms y n pi0
f=(y-n*pi0)^2/pi0+((n-y)-n*(1-pi0))^2/(1-pi0);
f1=simple(f)
%result: -(-y+pi0*n)^2/n/pi0/(-1+pi0)
 How about c>2?
STA 517 – Introduction: Distribution and Inference
10
1.5.5 Likelihood-Ratio Chi-Squared
 An alternative test for multinomial parameters uses the
likelihood-ratio test.
 The kernel of the multinomial likelihood is
 Under H0 the likelihood is maximized when
 In the general case, it is maximized when
 The ratio of the likelihoods equals
 Thus, the likelihood-ratio statistic is
STA 517 – Introduction: Distribution and Inference
LR
 In the general case, the parameter space consists of
{j} subject to j=1, so the dimensionality is c-1.
Under H0, the {j} are specified completely, so the
dimension is 0. The difference in these dimensions
equals c-1.
 For large n, G2 has a chi-squared null distribution with
df c-1.
11
STA 517 – Introduction: Distribution and Inference
 Both chi-squared dist. With df=c-1
 Asymptotically equivalent
12
STA 517 – Introduction: Distribution and Inference
Wu, Ma, George (2007)
13
STA 517 – Introduction: Distribution and Inference
14
1.5.6 Testing with Estimated
Expected Frequencies
 Pearson’s chi-square was proposed for testing
H0: j=j0, where j0 are fixed.
 In some application, j0=j0() are function of a small
set of unknown parameters .
 ML estimates of  determine ML estimates
{j0=j0()} and hence ML estimates
expected frequencies in X2.
 Replacing
of X2.
by estimates
 the true df=(c-1)-dim()
of
of
affects the distribution
STA 517 – Introduction: Distribution and Inference
15
Example
 A sample of 156 dairy calves born in Okeechobee
County, Florida, were classified according to whether
they caught pneumonia within 60 days of birth.
 Calves that got a pneumonia infection were also
classified according to whether they got a secondary
infection within 2 weeks after the first infection cleared
up.
 Hypothesis: the primary infection had an immunizing
effect that reduced the likelihood of a secondary
infection.
 How to test it?
STA 517 – Introduction: Distribution and Inference
16
Data structure
 Calves that did not get a primary infection could not get
a secondary infection, so no observations can fall in the
category for ‘‘no’’ primary infection and ‘‘yes’’
secondary infection.
 That combination is called a structural zero.
STA 517 – Introduction: Distribution and Inference
17
Test: whether the probability of primary
infection was the same as the conditional probability of
secondary infection, given that the calf got the primary
infection.
 ab denotes the probability that a calf is classified in row
a and column b of this table, the null hypothesis is
 Let =11+12 denote the probability of primary
infection. Then hypothesis probability is
STA 517 – Introduction: Distribution and Inference
18
MLE and chi-squared test
 Likelihood
 Log likelihood
 Differentiation with respect to 
 Solution
 For the example
 Expected counts for each cell
Conclusion: the primary infection had an immunizing effect that
reduced the likelihood of a secondary infection.
STA 517 – Introduction: Distribution and Inference
19
Standard Error
 Since
 the information is its expected value, which is
 which simplifies to
 The asymptotic standard error is the square root of the
inverse information, or
STA 517 – Introduction: Distribution and Inference
How about confidence limits?
20
STA 517 – Introduction: Distribution and Inference
SAS code - MLE, test for binomial
proc IML;
y=842; n=1824;pi0=0.5; /*data*/
pihat=y/n; SE=sqrt(pihat*(1-pihat)/n); /*MLE*/
WaldStat=(pihat-pi0)**2/SE**2;
pWald=1-CDF('CHISQUARE', WaldStat, 1);
LR=2*(y*log(pihat/(pi0)) +(n-y)*log((1-pihat)/(1-pi0)));
pLR=1-CDF('CHISQUARE',LR, 1);
ScoreStat=(pihat-pi0)**2/(pi0*(1-pi0)/n);
pScore=1-CDF('CHISQUARE',ScoreStat, 1);
print WaldStat pWald;
print LR pLR;
print ScoreStat pScore;
21
STA 517 – Introduction: Distribution and Inference
SAS code - MLE, test for binomial
data D;
input outcome $ w;
cards;
Yes 842
No
982
;
proc freq;
weight w;
table outcome/all CL
BINOMIAL(P=0.5
LEVEL="Yes");
exact binomial;
run;
22
STA 517 – Introduction: Distribution and Inference
SAS code – multinomial
data D;
input outcome $ w;
cards;
yellow
green
6022
2001
;
proc freq; weight w;
table outcome/chisq TESTP=(0.25 0.75);
run;
23