Download Inferences for a Single Population Mean ( )

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
Transcript
Inferences for a Single Population Mean (  )
(100 -  )% Confidence Interval for  (e.g.   .05  95% confidence)
The basic form of a confidence interval is as follows:
(estimate) + (table value) * SE(estimate)
For a single population mean a 100(1-  )% CI for  is:
X  t (1 / 2),df SE( X )
where SE ( X )  s
n
Confidence Level
95 % (   .05)
90 % (   .10 )
99 % (   .01 )
and t (1 / 2 ),df = t-distribution quantile with df  n  1 .
1 / 2
.975
.950
.995
Hypothesis Testing for a Single Population Mean (  )
Null Hypothesis ( H o )
Alternative Hypothesis ( H a )
p-value area
  o
  o
  o
  o
  o
  o
Upper-tail
Lower-tail
Two-tailed (perform test
using CI for  )
Test Statistic (in general)
In general the basic form of a test statistic is given by:
(estimate)  (hypothesized value)
which measures the discrepancy between the
t
SE (estimate)
estimate from our sample and the hypothesized value under the null hypothesis.
Intuitively, if our sample-based estimate is “far away” from the hypothesized value
assuming the null hypothesis is true, we will reject the null hypothesis in favor of the
alternative or research hypothesis. Extreme test statistic values occur when our estimate
is a large number of standard errors away from the hypothesized value under the null.
The p-value is the probability, that by chance variation alone, we would get a test statistic
as extreme or more extreme than the one observed assuming the null hypothesis is true.
If this probability is “small” then we have evidence against the null hypothesis, in other
words we have evidence to support our research hypothesis.
1
Test Statistic for Testing a Single Population Mean (  )
t
X  o
X  o
~ t-distribution with df = n – 1.
or t 
s
SE ( X )
n
Assumptions:
When making inferences about a single population mean we assume the following:
1. The sample constitutes a random sample from the population of interest.
2. The population distribution is normal. This assumption can be relaxed when
our sample size in sufficiently “large”. How large the sample size needs to be is
dependent upon how “non-normal” the population distribution is.
Comparing Two Population Means using Independent
Samples (  vs.  )
1
2
Case 1 ~ Equal Populations Variances/Standard Deviations (  1   2 )
Assumptions:
For this case we make the following assumptions
1. The samples from the two populations were drawn independently.
2. The population variances/standard deviations are equal.
3. The populations are both normally distributed. This assumption can be relaxed
when the samples from both populations are “large”.
100(1 -  )% Confidence Interval for ( 1   2 )
( X 1  X 2 )  t (1 / 2),df SE( X 1  X 2 )
where
1 
2 1
SE ( X 1  X 2 )  s p   
 n1 n2 
and
(n1  1) s1  (n2  1) s 2
n1  n2  2
2
sp 
2
2
2
s p is called the “pooled estimate of the common variance”. The degrees of freedom for
the t-distribution is df  n1  n2  2 . The t-quantiles are same as those for the single
population case described above.
2
Hypothesis Testing ( 1 vs.  2 )
The general null hypothesis says that the two population means are equal, or equivalently
there difference is zero. The alternative or research hypothesis can be any one of the
three usual choices (upper-tail, lower-tail, or two-tailed). For the two-tailed case we can
perform the test by using a confidence interval for the difference in the population means
discussed above.
H o : 1   2 or equivalently ( 1   2 )  0
H a: 1   2 or equivalently ( 1   2 )  0 (upper - tail)
etc....
Test Statistic
t
(X1  X 2 )  0
~ t-distribu tion with df  n1  n2  2
SE ( X 1  X 2 )
where the SE ( X 1  X 2 ) is as defined in the confidence interval section above.
Case 2 ~ Unequal Populations Variances/Standard Deviations (  1   2 )
Assumptions:
For this case we make the following assumptions
1. The samples from the two populations were drawn independently.
2. The population variances/standard deviations are NOT equal.
(This can be formally tested or use rule o’thumb)
3. The populations are both normally distributed. This assumption can be relaxed
when the samples from both populations are “large”.
100(1 -  )% Confidence Interval for ( 1   2 )
( X 1  X 2 )  t (1 / 2),df SE( X 1  X 2 )
where
2
SE ( X 1  X 2 ) 
2
s1
s
 2
n1
n2
and
df 
 s1 2 s 2 2 


n  n 
1
2


2
rounded down to the nearest integer
2
2
 s1 2 
 s2 2 




n 
n 
 1   2 
n1  1
n2  1
The t-quantiles are the same as those we have seen previously.
3
Hypothesis Testing
Test Statistic
t
(X1  X 2 )  0
~ t-distribu tion with df  n1  n2  2
SE ( X 1  X 2 )
where the SE ( X 1  X 2 ) is as defined in the confidence interval section above.
Comparing Two Population Means Using Dependent
Samples
When using dependent samples each observation from population 1 has a one-to-one
correspondence with an observation from population 2. One of the most common cases
where this arises is when we measure the response on the same subjects before and after
treatment. This is commonly called a “pre-test/post-test” situation. However, sometimes
we have pairs of subjects in the two populations meaningfully matched on some prespecified criteria. For example, we might match individuals who are the same race,
gender, socio-economic status, height, weight, etc... to control for the influence these
characteristics might have on the response of interest. When this is done we say that we
are “controlling for the effects of race, gender, etc...”. By using matched-pairs of subjects
we are in effect removing the effect of potential confounding factors, thus giving us a
clearer picture of the difference between the two populations being studied.
DATA FORMAT
Matched Pair X 1i
1
2
3
...
n
X 2i
X 11 X 21
X 12 X 22
X 13 X 23
...
...
X 1n X 2 n
d i  X 1i  X 2i
d1
d2
d3
...
dn
For the sample paired differences
( d i ' s ) find the sample mean (d )
and standard deviation ( s d ) .
The hypotheses are
H o : d  0
H a :  d  0 or H a :  d  0 or H a :  d  0
In the Captopril blood pressure example in class the paired differences were given
by d i  sysprei  sysposti . Thus, positive values for the paired difference corresponded
to a reduction in the systolic blood pressure after taking Captopril and measuring their
blood pressure ½ hour later. Given that we wished to determine if there significant
evidence of a decrease in blood pressure we wish to test the following:
4
H o :  syspre syspost  0
H a :  syspre syspost  0
Here  d  true mean decrease in blood pressure ½ hour after a patient takes blood
pressure.
Test Statistic for a Paired t-test
d   do
t
~ t - distributi on with df  n - 1
sd
n
Note:  d o  the hypothesized value for the mean paired difference (usually taken to 0 as
is the case with the Captopril study).
100(1-  )% CI for  d
s
d  t (1 / 2,df ) d
n
This interval has a 100(1-  )% chance of covering the true mean paired difference.
See the JMP tutorial for the results of this test for the Captopril study.
5
Making Inferences About
Proportions/Percentages/Binomial Prob. of “Success”
Inferences about a Single Population Proportion (p)
Confidence Interval for p
100(1 -  )% CI for p
pˆ  z  SE ( pˆ )
pˆ  z 
pˆ qˆ
n
Here p̂  sample proportion which is the number of “successes” in our sample divided by
the sample size, qˆ  1  pˆ , and z = equals a standard normal table value that corresponds
to our desired confidence level.
Confidence Level
95 % (   .05)
90 % (   .10 )
99 % (   .01 )
z
1.96
1.645
2.576
Hypothesis Tests for p
H o : p  po
H a : p  po or p  po or p  po
Test Statistic
pˆ  p o
z
~ standard normal distributi on N(0,1) provided npo  5 and nq o  5
po qo
n
qo  1  po
When our sample size is small we use the binomial distribution to calculate the p-value.
Example:
6