Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Chapter 9: Inferences Based on Two Samples Dr. Sharabati Purdue University April 10, 2014 Dr. Sharabati (Purdue University) Inferences Spring 2014 1 / 26 Tests for Two Different Population Distributions The z Tests Two-independent Sample t Test Paired-Sample t Test Dr. Sharabati (Purdue University) Inferences Spring 2014 2 / 26 Motivation We are often interested in comparing two populations (or groups) based on a continuous measurement. For instance, to evaluate impact of light on the growth of plants, one group of seedlings grows in dark conditions, and a second group gets the standard amount of light. Compare heights of plants after a specified time period. Each group has different individuals who may receive different treatments. Responses from each sample are independent of each other. Dr. Sharabati (Purdue University) Inferences Spring 2014 3 / 26 Goal: compare the population means of the two groups. Notation: Population 1 2 Sample 1 2 Mean µ1 µ2 Sample size n1 n2 Standard Deviation σ1 σ2 Mean x̄1 x̄2 Standard Deviation s1 s2 Assumptions: Sample 1 is a random sample from a population with mean µ1 and variance σ12 . Sample 2 is a random sample from a population with mean µ2 and variance σ22 . Sample 1 and sample 2 are independent of one another. Dr. Sharabati (Purdue University) Inferences Spring 2014 4 / 26 z Test for Normal Populations with Known Variance The z test concerns hypotheses about µ1 − µ2 , the difference between two population means. Both population distribution are normal and the values of σ12 and σ22 are known. Hypotheses: H0 : µ1 − µ2 = ∆0 Ha : µ1 − µ2 > ∆0 or H0 : µ1 − µ2 = ∆0 Ha : µ1 − µ2 < ∆0 or H0 : µ1 − µ2 = ∆0 Ha : µ1 − µ2 6= ∆0 The test statistic is: z= Dr. Sharabati (Purdue University) x̄1 − x̄2 − ∆0 q 2 σ1 σ22 n1 + n2 Inferences Spring 2014 5 / 26 Large-Sample Tests The z test concerns hypotheses about µ1 − µ2 , the difference between two population means when both n1 > 40 and n2 > 40. Hypotheses: H0 : µ1 − µ2 = ∆0 Ha : µ1 − µ2 > ∆0 or H0 : µ1 − µ2 = ∆0 Ha : µ1 − µ2 < ∆0 or H0 : µ1 − µ2 = ∆0 Ha : µ1 − µ2 6= ∆0 The test statistic is: z= Dr. Sharabati (Purdue University) x̄1 − x̄2 − ∆0 q 2 s1 s22 n1 + n2 Inferences Spring 2014 6 / 26 Confidence Interval/Bound When Sample Size is Large Provided that n1 and n2 are both large, a Confidence interval for µ1 − µ2 with a confidence level of approximately 1 − α is: s s21 s2 x̄1 − x̄2 ± z ∗ + 2, n1 n2 where z ∗ = zα/2 , − gives the lower limit and + gives gives the upper limit of the intercal. An upper or lower confidence bound can also be calculated by retaining the appropriate sign (+ or −) and replacing z ∗ = zα . Dr. Sharabati (Purdue University) Inferences Spring 2014 7 / 26 Two-independent Sample t Test The two-sample t test concerns hypotheses about µ1 − µ2 , the difference between two population means. Both population distribution are normal and the values of σ12 and σ22 are unknown. Hypotheses: H0 : µ1 − µ2 = ∆0 Ha : µ1 − µ2 > ∆0 Dr. Sharabati (Purdue University) or H0 : µ1 − µ2 = ∆0 Ha : µ1 − µ2 < ∆0 Inferences or H0 : µ1 − µ2 = ∆0 Ha : µ1 − µ2 6= ∆0 Spring 2014 8 / 26 Two-independent Sample t Test The (unpooled) two-sample t test statistic is: t= x̄1 − x̄2 − ∆0 q 2 s22 s1 n1 + n2 Assume that both samples were randomly selected from the populations and both populations are normally distributed. The test statistic has a t distribution with k degrees of freedom if H0 is true, where k= 2 s21 n1 + s22 n2 (s21 /n1 )2 n1 −1 + (s22 /n2 )2 n2 −1 round k down to the nearest integer. Dr. Sharabati (Purdue University) Inferences Spring 2014 9 / 26 Two-independent Sample t Test The (pooled) two-sample t test statistic is: t= x̄1 − x̄2 − ∆0 q spooled n11 + n12 where the pooled standard deviation is s (n1 − 1)s21 + (n2 − 1)s22 spooled = n1 + n2 − 2 Assume that both samples were randomly selected from the populations and both populations are normally distributed with a common population standard deviation. The test statistic has a t distribution with k degrees of freedom if H0 is true, where k = n1 + n2 − 2 Dr. Sharabati (Purdue University) Inferences Spring 2014 10 / 26 Note: If the sample sizes n1 and n2 are both large (n1 > 40 and n2 > 40), we no longer require that the samples came from a normal distribution, because the CLT ensures that the sample means are approximately normal. The test statistic can then be denoted as z, and the rejection region and p-values can be computed using the standard normal distributions. In this case, the unpooled procedure should be used. Like the pooled t confidence intervals, the pooled t test is not robust to violations of the equal standard deviation assumption. We therefore recommend the unpooled t test unless there is really compelling evidence for doing otherwise. Dr. Sharabati (Purdue University) Inferences Spring 2014 11 / 26 Two-independent Sample t Confidence Intervals The level 1 − α Confidence interval for µ1 − µ2 is: s s2 s21 + 2 x̄1 − x̄2 ± t∗ n1 n 2 where t∗ = tα/2,k is the value for the t(k) density curve with area 1 − α between −t∗ and t∗ . An upper confidence bound for µ1 − µ2 is: s s21 s2 x̄1 − x̄2 + tα,k + 2 n1 n2 An lower confidence bound for µ1 − µ2 is: s s21 s2 x̄1 − x̄2 − tα,k + 2 n1 n2 Dr. Sharabati (Purdue University) Inferences Spring 2014 12 / 26 Two-independent Sample t Test when ∆0 = 0 1. Write the hypotheses in terms of the difference between means. H0 : µ1 = µ2 H0 : µ1 = µ2 H0 : µ1 = µ2 or or Ha : µ1 > µ2 Ha : µ1 < µ2 Ha : µ1 6= µ2 2. Calculate the test statistic x̄1 − x̄2 t= q 2 s1 s22 n1 + n2 Dr. Sharabati (Purdue University) Inferences Spring 2014 13 / 26 3. Calculate the p-value For Ha : µ1 < µ2 , p-value = P (T < t), For Ha : µ1 > µ2 , p-value = P (T > t), For Ha : µ1 6= µ2 , p-value = 2P (T > |t|). where T ∼ t(k). 4. State conclusions in terms of the problem: Choose a significance level α and compare the p-value to the α level. If p-value ≤ α, then reject H0 (significant results). If p-value > α, then fail to reject H0 (nonsignificant results). Dr. Sharabati (Purdue University) Inferences Spring 2014 14 / 26 Exercise A group of 15 college seniors are selected to participate in a manual dexterity skill test against a group of 20 industrial workers. Skills are assessed by scores obtained on a test taken by both groups. Conduct a hypothesis test to determine whether the industrial workers had better manual dexterity skills than the students at the 0.05 significance level. Descriptive statistics are listed below. Also construct a 95% confidence interval for this problem. Group Students Workers Dr. Sharabati (Purdue University) n 15 20 x̄ 35.12 37.32 Inferences s 4.31 3.83 Spring 2014 15 / 26 Matched Pairs t Test Procedures Observations occur in pairs such as: as identical twins two observations on the same individual (two days, pre- and post-tests, before and after measurements) a matched pair design Dr. Sharabati (Purdue University) Inferences Spring 2014 16 / 26 Confidence intervals and hypothesis testing are based on the difference obtained between the 2 measurements Find the difference = post test - pre test (or before - after, etc.), in the individual measurements. Find the sample mean d¯ and sample standard deviation sD of these differences. Perform one-sample t procedures for these differences. Confidence interval of the population mean difference: sD d¯ ± t∗ √ n Hypothesis testing (H0 : µdiff = ∆0 , i.e., the population mean difference is zero): d¯ − ∆0 √ t= sD / n Dr. Sharabati (Purdue University) Inferences Spring 2014 17 / 26 Example Researchers are interested in whether Vitamin C is lost when wheat soy blend (CSB) is cooked as gruel. Samples of gruel were collected, and the vitamin C content was measured (in mg per 100 grams of gruel) before and after cooking. Here are the results: Sample Before After Before - After 1 73 20 53 2 79 27 52 3 86 29 57 4 88 36 52 5 78 17 61 Mean 80.8 25.8 55 St. Dev. 6.14 7.53 3.94 a. Set up an appropriate hypothesis test for the population mean difference and carry it out for these data. State your conclusions in a sentence. b. Find a 90% confidence interval for the mean vitamin C content loss. Dr. Sharabati (Purdue University) Inferences Spring 2014 18 / 26 Paired-Sample Test The paired-sample t test concerns hypotheses about µd = µ1 − µ2 , the mean difference between a pair of observations. Hypotheses: H0 : µd = ∆0 Ha : µd > ∆0 Dr. Sharabati (Purdue University) or H0 : µd = ∆0 Ha : µd < ∆0 Inferences or H0 : µd = ∆0 Ha : µd 6= ∆0 Spring 2014 19 / 26 Find the difference for each of the n pairs, that is di = xi1 − xi2 (i = 1, 2, . . . , n). Find the sample mean d¯ and sample standard deviation sD of these differences. Perform one-sample t test for these differences. That is, The paired-sample t test statistic is: t= d¯ − ∆0 √ sD / n Assume that the differences d1 , d2 , . . . , dn were selected randomly from a normal population. The test statistic has a t distribution with n − 1 degrees of freedom if H0 is true. Dr. Sharabati (Purdue University) Inferences Spring 2014 20 / 26 Note: If the sample size is large (n > 40), we no longer require that the di ’s came from a normal distribution, because the CLT ensures that d¯ is approximately normal. The test statistic can then be denoted as z, and the rejection region and the p-value can be computed using the standard normal distribution. Dr. Sharabati (Purdue University) Inferences Spring 2014 21 / 26 Exercise In an effort to determine whether sensitivity training for nurses would improve the quality of nursing provided at an area hospital, the following study was conducted. Eight different nurses were selected and their nursing skills were given a score from 1-10. After this initial screening, a training program was administered, and then the same nurses were rated again. Below is a table of their pre- and post-training scores, along with the difference in the score. Conduct a test to determine whether the training could on average improve the quality of nursing provided in the population. Dr. Sharabati (Purdue University) Inferences Spring 2014 22 / 26 Individual 1 2 3 4 5 6 7 8 Pre-training score 2.56 3.22 3.45 5.55 5.63 7.89 7.66 6.20 Post-training score 4.54 5.33 4.32 7.45 7.00 9.80 5.33 6.80 a. What are the hypotheses? b. What is the test statistic? Dr. Sharabati (Purdue University) Inferences Spring 2014 23 / 26 c. What is the p-value or reject region? d. What is your conclusion in terms of the story? e. What is the 95% confidence interval of the population mean difference in nursing scores? Dr. Sharabati (Purdue University) Inferences Spring 2014 24 / 26 Summary Matched pairs vs. two-independent sample comparison of means? Matched pairs if all units are measured twice and/or receive both treatments over time. Before vs. after is the most common example. Two-independent sample comparison of means if you have two separate groups, but each unit is only measured once. Men vs. women is the most common example. Confidence Interval and Hypothesis Tests Suppose that the 95% two-tailed confidence interval for a population mean µ based on a particular sample is (20.5, 27.2). If the sample is used to test H0 : µ = µ0 against Ha : µ 6= µ0 at the 0.05 significance level, the test will fail to reject H0 if µ0 falls in the 95% confidence interval and will reject H0 if µ0 is not in the 95% confidence interval. In general, a 100(1-α)% confidence interval can be used equivalently with a two-tailed hypothesis test at the α level. Dr. Sharabati (Purdue University) Inferences Spring 2014 25 / 26 Summary of t Tests for Two Different Population Distributions Hypotheses: H0 : µ1 − µ2 = ∆0 Ha : µ1 − µ2 > ∆0 or H0 : µ1 − µ2 = ∆0 Ha : µ1 − µ2 < ∆0 or H0 : µ1 − µ2 = ∆0 Ha : µ1 − µ2 6= ∆0 Two-independent sample t test The (unpooled) two-sample t test statistic is: t= x̄1 − x̄2 − ∆0 q 2 s1 s22 n1 + n2 The (pooled) two-sample t test statistic is: t= x̄1 − x̄2 − ∆0 q spooled n11 + n12 The paired-sample t test statistic is: t= Dr. Sharabati (Purdue University) d¯ − ∆0 √ sD / n Inferences Spring 2014 26 / 26