Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Inferences for a Single Population Mean ( ) (100 - )% Confidence Interval for (e.g. .05 95% confidence) The basic form of a confidence interval is as follows: (estimate) + (table value) * SE(estimate) For a single population mean a 100(1- )% CI for is: X t (1 / 2),df SE( X ) where SE ( X ) s n Confidence Level 95 % ( .05) 90 % ( .10 ) 99 % ( .01 ) and t (1 / 2 ),df = t-distribution quantile with df n 1 . 1 / 2 .975 .950 .995 Hypothesis Testing for a Single Population Mean ( ) Null Hypothesis ( H o ) Alternative Hypothesis ( H a ) p-value area o o o o o o Upper-tail Lower-tail Two-tailed (perform test using CI for ) Test Statistic (in general) In general the basic form of a test statistic is given by: (estimate) (hypothesized value) which measures the discrepancy between the t SE (estimate) estimate from our sample and the hypothesized value under the null hypothesis. Intuitively, if our sample-based estimate is “far away” from the hypothesized value assuming the null hypothesis is true, we will reject the null hypothesis in favor of the alternative or research hypothesis. Extreme test statistic values occur when our estimate is a large number of standard errors away from the hypothesized value under the null. The p-value is the probability, that by chance variation alone, we would get a test statistic as extreme or more extreme than the one observed assuming the null hypothesis is true. If this probability is “small” then we have evidence against the null hypothesis, in other words we have evidence to support our research hypothesis. 1 Test Statistic for Testing a Single Population Mean ( ) t X o X o ~ t-distribution with df = n – 1. or t s SE ( X ) n Assumptions: When making inferences about a single population mean we assume the following: 1. The sample constitutes a random sample from the population of interest. 2. The population distribution is normal. This assumption can be relaxed when our sample size in sufficiently “large”. How large the sample size needs to be is dependent upon how “non-normal” the population distribution is. Comparing Two Population Means using Independent Samples ( vs. ) 1 2 Case 1 ~ Equal Populations Variances/Standard Deviations ( 1 2 ) Assumptions: For this case we make the following assumptions 1. The samples from the two populations were drawn independently. 2. The population variances/standard deviations are equal. 3. The populations are both normally distributed. This assumption can be relaxed when the samples from both populations are “large”. 100(1 - )% Confidence Interval for ( 1 2 ) ( X 1 X 2 ) t (1 / 2),df SE( X 1 X 2 ) where 1 2 1 SE ( X 1 X 2 ) s p n1 n2 and (n1 1) s1 (n2 1) s 2 n1 n2 2 2 sp 2 2 2 s p is called the “pooled estimate of the common variance”. The degrees of freedom for the t-distribution is df n1 n2 2 . The t-quantiles are same as those for the single population case described above. 2 Hypothesis Testing ( 1 vs. 2 ) The general null hypothesis says that the two population means are equal, or equivalently there difference is zero. The alternative or research hypothesis can be any one of the three usual choices (upper-tail, lower-tail, or two-tailed). For the two-tailed case we can perform the test by using a confidence interval for the difference in the population means discussed above. H o : 1 2 or equivalently ( 1 2 ) 0 H a: 1 2 or equivalently ( 1 2 ) 0 (upper - tail) etc.... Test Statistic t (X1 X 2 ) 0 ~ t-distribu tion with df n1 n2 2 SE ( X 1 X 2 ) where the SE ( X 1 X 2 ) is as defined in the confidence interval section above. Case 2 ~ Unequal Populations Variances/Standard Deviations ( 1 2 ) Assumptions: For this case we make the following assumptions 1. The samples from the two populations were drawn independently. 2. The population variances/standard deviations are NOT equal. (This can be formally tested or use rule o’thumb) 3. The populations are both normally distributed. This assumption can be relaxed when the samples from both populations are “large”. 100(1 - )% Confidence Interval for ( 1 2 ) ( X 1 X 2 ) t (1 / 2),df SE( X 1 X 2 ) where 2 SE ( X 1 X 2 ) 2 s1 s 2 n1 n2 and df s1 2 s 2 2 n n 1 2 2 rounded down to the nearest integer 2 2 s1 2 s2 2 n n 1 2 n1 1 n2 1 The t-quantiles are the same as those we have seen previously. 3 Hypothesis Testing Test Statistic t (X1 X 2 ) 0 ~ t-distribu tion with df n1 n2 2 SE ( X 1 X 2 ) where the SE ( X 1 X 2 ) is as defined in the confidence interval section above. Comparing Two Population Means Using Dependent Samples When using dependent samples each observation from population 1 has a one-to-one correspondence with an observation from population 2. One of the most common cases where this arises is when we measure the response on the same subjects before and after treatment. This is commonly called a “pre-test/post-test” situation. However, sometimes we have pairs of subjects in the two populations meaningfully matched on some prespecified criteria. For example, we might match individuals who are the same race, gender, socio-economic status, height, weight, etc... to control for the influence these characteristics might have on the response of interest. When this is done we say that we are “controlling for the effects of race, gender, etc...”. By using matched-pairs of subjects we are in effect removing the effect of potential confounding factors, thus giving us a clearer picture of the difference between the two populations being studied. DATA FORMAT Matched Pair X 1i 1 2 3 ... n X 2i X 11 X 21 X 12 X 22 X 13 X 23 ... ... X 1n X 2 n d i X 1i X 2i d1 d2 d3 ... dn For the sample paired differences ( d i ' s ) find the sample mean (d ) and standard deviation ( s d ) . The hypotheses are H o : d 0 H a : d 0 or H a : d 0 or H a : d 0 In the Captopril blood pressure example in class the paired differences were given by d i sysprei sysposti . Thus, positive values for the paired difference corresponded to a reduction in the systolic blood pressure after taking Captopril and measuring their blood pressure ½ hour later. Given that we wished to determine if there significant evidence of a decrease in blood pressure we wish to test the following: 4 H o : syspre syspost 0 H a : syspre syspost 0 Here d true mean decrease in blood pressure ½ hour after a patient takes blood pressure. Test Statistic for a Paired t-test d do t ~ t - distributi on with df n - 1 sd n Note: d o the hypothesized value for the mean paired difference (usually taken to 0 as is the case with the Captopril study). 100(1- )% CI for d s d t (1 / 2,df ) d n This interval has a 100(1- )% chance of covering the true mean paired difference. See the JMP tutorial for the results of this test for the Captopril study. 5 Making Inferences About Proportions/Percentages/Binomial Prob. of “Success” Inferences about a Single Population Proportion (p) Confidence Interval for p 100(1 - )% CI for p pˆ z SE ( pˆ ) pˆ z pˆ qˆ n Here p̂ sample proportion which is the number of “successes” in our sample divided by the sample size, qˆ 1 pˆ , and z = equals a standard normal table value that corresponds to our desired confidence level. Confidence Level 95 % ( .05) 90 % ( .10 ) 99 % ( .01 ) z 1.96 1.645 2.576 Hypothesis Tests for p H o : p po H a : p po or p po or p po Test Statistic pˆ p o z ~ standard normal distributi on N(0,1) provided npo 5 and nq o 5 po qo n qo 1 po When our sample size is small we use the binomial distribution to calculate the p-value. Example: 6