Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Basic Practice of Statistics 7th Edition Lecture PowerPoint Slides In Chapter 21, We Cover … Two-sample problems Comparing two population means Two-sample t procedures Robustness again Avoid the pooled two-sample t procedures* Avoid inference about standard deviations* Two-Sample Problems Suppose we want to compare the mean of some quantitative variable for the individuals in two populations―Population 1 and Population 2. Our parameters of interest are the population means µ1 and µ2. The best approach is to take separate random samples from each population and to compare the sample means. We use the mean response in the two groups to make the comparison. Here’s a table that summarizes these two situations: 3 Comparing Two Population Means CONDITIONS FOR INFERENCE COMPARING TWO MEANS We have two SRSs from two distinct populations. The samples are independent. That is, one sample has no influence on the other. We measure the same response variable for both samples. Both populations are Normally distributed. The means and standard deviations of the populations are unknown. In practice, it is enough that the distributions have similar shapes and that the data have no strong outliers. Call the variable 𝑥1 in the first population and 𝑥2 in the second because the variable may have different distributions in the two populations. Here is how we describe the two populations: Population Variable Mean Standard deviation 1 𝑥1 𝜇1 𝜎1 2 𝑥2 𝜇2 𝜎2 Comparing Two Population Means Here is how we describe the two samples: Population Sample size Sample mean Sample standard deviation 1 𝑛1 𝑥1 𝑠1 2 𝑛2 𝑥2 𝑠2 To do inference about the difference 𝜇1 − 𝜇2 between the means of two populations, we start from the difference 𝑥1 − 𝑥2 between the means of the two samples. Two-Sample t Procedures To take variation into account, we would like to standardize the observed difference 𝑥1 − 𝑥2 by subtracting its mean, 𝜇1 − 𝜇2 , and dividing the result by its standard deviation. When the Independent condition is met, the standard deviation of the statistic x1 - x 2 is : s x -x = 1 2 s 12 n1 + s 22 n2 Because we don't know the population standard deviations, we estimate them by the sample standard deviations from our two samples. The result is the standard error, or estimated standard deviation, of the difference in sample means: 𝑆𝐸𝑥1−𝑥2 = 𝑠12 𝑠22 + 𝑛1 𝑛2 Two-Sample t Procedures When we standardize the estimate by subtracting its mean, 𝜇1 − 𝜇2 , and dividing the result by its standard error, the result is the twosample t statistic: 𝑥1 − 𝑥2 − 𝜇1 − 𝜇2 𝑡= 𝑠12 𝑠22 + 𝑛1 𝑛2 The two-sample t statistic has approximately a t distribution. It does not have exactly a t distribution, even if the populations are both exactly Normal. In practice, however, the approximation is very accurate. There are two practical options for using the two-sample t procedures: We can use technology to determine degrees of freedom OR we can use the smaller of n1 – 1 and n2 – 1 for the degrees of freedom. Two-Sample t Procedures: Confidence Interval for µ1 - µ2 THE TWO-SAMPLE t PROCEDURES Draw an SRS of size 𝑛1 from a large Normal population with unknown mean 𝜇1 , and draw an independent SRS of size 𝑛2 from another large Normal population with unknown mean 𝜇2 . A level C confidence interval for 𝝁𝟏 − 𝝁𝟐 is given by 𝑥1 − 𝑥2 ± 𝑡 ∗ 𝑠12 𝑠22 + 𝑛1 𝑛2 Here, 𝑡 ∗ is the critical value for confidence level C for the t distribution with degrees of freedom from either Option 1 (software) or Option 2 (the smaller of 𝑛1 − 1 and 𝑛2 − 1). Example STATE: People gain weight when they take in more energy from food than they expend. James Levine and his collaborators at the Mayo Clinic investigated the link between obesity and energy spent on daily activity with data from a study with 𝑛1 = 𝑛2 = 10 health volunteers; 10 who were lean, 10 who were mildly obese but still healthy. They wanted to address the question: Do lean and obese people differ in the average time they spend standing and walking? PLAN: Give a 90% confidence interval for 𝜇1 − 𝜇2 , the difference in average daily minutes spent standing and walking between lean and mildly obese adults. SOLVE: Examination of the data reveals all conditions for inference can be (at least reasonably) assumed; the distributions are a bit irregular, but with only 10 observations this is to be expected. Example SOLVE: (cont’d) The descriptive statistics: Group 𝑛 Mean, 𝒙 1 (lean) 10 525.751 107.121 2 (obese) 10 373.269 67.498 Std. Dev., s For using Option 2 (conservative degrees of freedom in absence of technology), 𝑛1 − 1 = 𝑛2 − 1 = 9, and t* = 1.833, giving: 𝑥1 − 𝑥2 ± 𝑡 ∗ 𝑠12 𝑛1 + 𝑠22 𝑛2 = 525.751 − 373.269 ± 1.833 107.1212 10 + 67.4982 10 = 152.482 ± 73.390= 79.09 to 225.87 minutes Software using Option 1 gives df = 15.174 and t* = 1.752, for a confidence interval of 82.35 to 222.62 minutes—narrower because Option 2 is conservative. CONCLUDE: Whichever interval we report, we are (at least) 90% confident that the mean difference in average daily minutes spent standing and walking between lean and mildly obese adults lies in this interval. Two-Sample t Procedures: Two-Sample t Test THE TWO-SAMPLE t PROCEDURES To test the hypothesis 𝑯𝟎 : 𝝁𝟏 = 𝝁𝟐 , calculate the two-sample t statistic: 𝑥1 − 𝑥2 𝑡= 𝑠12 𝑠22 + 𝑛1 𝑛2 Find P-values from the t distribution with degrees of freedom from either Option 1 (software) or Option 2 (the smaller of 𝑛1 − 1 and 𝑛2 − 1). Two-Sample t Test Two-Sample t Test for the Difference Between Two Means Suppose the Random, Normal, and Independent conditions are met. To test the hypothesis H 0 : m1 - m2 = hypothesized value, compute the t statistic t= (x1 - x2 ) s12 s22 + n1 n2 Find the P-value by calculating the probabilty of getting a t statistic this large or larger in the direction specified by the alternative hypothesis H a. Use the t distribution with degrees of freedom approximated by technology or the smaller of n1 -1 and n2 -1. 12 Example COMMUNITY SERVICE AND ATTACHMENT TO FRIENDS STATE: Do college students who have volunteered for community service work differ from those who have not? A study obtained data from 57 students who had done service work and 17 who had not. One of the response variables was a measure of attachment to friends. Here are the results: Group Condition 𝑛 𝒙 1 Service 57 105.32 14.68 2 No service 17 96.82 14.26 s PLAN: The investigator had no specific direction for the difference in mind before looking at the data, so the alternative is two-sided. We will test the following hypotheses: 𝐻0 : 𝜇1 = 𝜇2 𝐻𝑎 : 𝜇1 ≠ 𝜇2 Example SOLVE: The two-sample t statistic: 𝑡 = 𝑥1 − 𝑥2 𝑠12 𝑠22 + 𝑛1 𝑛2 = 105.32 − 96.82 14.682 14.262 + 17 57 = 8.5 3.9677 = 2.142 Software (Option 1) says that the two-sided P-value is 0.0414. For using Option 2, 𝑛1 − 1 = 56, 𝑛2 − 1 = 16, and therefore comparing our test statistic of 2.142 to two-sided critical values of a t(16) distribution, Table C shows the P-value is between 0.05 and 0.04. CONCLUDE: The data give moderately strong evidence (P < 0.05) that students who have engaged in community service are, on the average, more attached to their friends. Robustness Again The two-sample t procedures are more robust than the one-sample t methods, particularly when the distributions are not symmetric. When the sizes of the two samples are equal and the two populations being compared have distributions with similar shapes, probability values from the t table are quite accurate for a broad range of distributions when the sample sizes are as small as 𝑛1 = 𝑛2 = 5. When the two population distributions have different shapes, larger samples are needed. As a guide to practice, adapt the guidelines for one-sample t procedures to two-sample procedures by replacing “sample size” with the “sum of the sample sizes,” 𝑛1 + 𝑛2 . Caution: In planning a two-sample study, choose equal sample sizes whenever possible. The two-sample t procedures are most robust against non-Normality in this case, and the conservative Option 2 probability values are most accurate. Avoid the Pooled Two-Sample t Procedures* Many calculators and software packages offer a choice of two-sample t statistics. One is often labeled for “unequal” variances; the other for “equal” variances. The “unequal” variance procedure is our two-sample t. Never use the pooled t procedures if you have software or technology that will implement the “unequal” variance procedure. Avoid Inference About Standard Deviations* There are methods for inference about the standard deviations of Normal populations. The most common such method is the “F test” for comparing the standard deviations of two Normal populations. Unlike the t procedures for means, the F test for standard deviations is extremely sensitive to nonNormal distributions. We do not recommend trying to do inference about population standard deviations in basic statistical practice.