Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Comparing Two Means • Comparing two means is not very different from comparing two proportions. • This time the parameter of interest is the difference between the two means, 1 – 2. Comparing Two Means (cont.) • Remember that, for independent random quantities, variances add. • So, the standard deviation of the difference between two sample means is SD x1 x2 12 n1 22 n2 • We still don’t know the true standard deviations of the two groups, so we need to estimate and use the standard error s12 s22 SE x1 x2 n1 n2 Comparing Two Means (cont.) • Because we are working with means and estimating the standard error of their difference using the data, we shouldn’t be surprised that the sampling model is a Student’s t. – The confidence interval we build is called a two-sample t-interval (for the difference in means). Assumptions and Conditions • Independence Assumption (Each condition needs to be checked for both groups.): – Randomization Condition: Were the data collected with suitable randomization (representative random samples or a randomized experiment)? – 10% Condition: We don’t usually check this condition for differences of means. We will check it for means only if we have a very small population or an extremely large sample. • Normal Population Assumption: – Nearly Normal Condition: This must be checked for both groups. A violation by either one violates the condition. Assumptions and Conditions (cont.) • Independent Groups Assumption: The two groups we are comparing must be independent of each other. Two-Sample t-Interval When the conditions are met, we are ready to find the confidence interval for the difference between means of two independent groups. The confidence interval is x1 x2 t df SE x1 x2 where the standard error of the difference of the means is s12 s22 SE x1 x2 n1 n2 The critical value depends on the particular confidence level, C, that you specify and on the number of degrees of freedom, which we get from the sample sizes and a special formula. Degrees of Freedom • The special formula for the degrees of freedom for our t critical value is a bear: 2 s s n n 2 1 df 2 2 2 2 s1 s2 1 1 n1 1 n1 n2 1 n2 2 1 2 2 • Because of this, we will let technology calculate degrees of freedom for us! • Or use the smaller of n1 or n2 Example: The table below gives information concerning the number of gallons used by ten different families in two different cities. Find a 95% confidence interval for the difference in the means. City X 26 27 31 29 36 19 20 34 26 27 City Y 28 26 30 32 32 35 28 21 33 34 1 = population mean number of gallons for City X 2 = population mean number of gallons for City Y Step 1: Check Assumptions/Conditions • We will assume a SRS • ’s are unknown, so we will use t-distribution • Populations are independent City X Based on the linearity of the Normal Probability plots, we have an approximately normal distributions. City Y Example: The table below gives information concerning the number of gallons used by ten different families in two different cities. Find a 95% confidence interval for the difference in the means. City X 26 27 31 29 36 19 20 34 26 27 City Y 28 26 30 32 32 35 28 21 33 34 Step 2: Calculate Confidence Interval 2 1 2 2 s s x1 x2 t * n1 n2 5.42 4.252 27.5 29.9 2.262 10 10 7.315, 2.515 Paired Data • Data are paired when the observations are collected in pairs or the observations in one group are naturally related to observations in the other group. • Paired data arise in a number of ways. Perhaps the most common is to compare subjects with themselves before and after a treatment. – When pairs arise from an experiment, the pairing is a type of blocking. – When they arise from an observational study, it is a form of matching. Paired Data (cont.) • If you know the data are paired, you can (and must!) take advantage of it. – To decide if the data are paired, consider how they were collected and what they mean (check the W’s). – There is no test to determine whether the data are paired. • Once we know the data are paired, we can examine the pairwise differences. – Because it is the differences we care about, we treat them as if they were the data and ignore the original two sets of data. Blocking • Consider estimating the mean difference in age between husbands and wives. • The following display is worthless. It does no good to compare all the wives as a group with all the husbands—we care about the paired differences. Blocking (cont.) • In this case, we have paired data—each husband is paired with his respective wife. The display we are interested in is the difference in ages: Insert histogram from page 579 of the text. Blocking (cont.) • Pairing allows us to concentrate on the variation associated with the difference in age for each pair. • A paired design is an example of blocking. Paired Data (cont.) • Now that we have only one set of data to consider, we can return to the simple onesample confidence interval for means. • Mechanically, a paired t-interval is just a one-sample confidence interval for the mean of the pairwise differences. – The sample size is the number of pairs. Assumptions and Conditions • Paired Data Assumption: The data must be paired. • Independence Assumption: The differences must be independent of each other. Check the: – Randomization Condition • Normal Population Assumption: We need to assume that the population of differences follows a Normal model. – Nearly Normal Condition: Check this with a histogram or Normal probability plot of the differences. Confidence Intervals for Matched Pairs • When the conditions are met, we are ready to find the confidence interval for the mean of the paired differences. • The confidence interval is xd t n1 SE xd where the standard error of the mean difference is sd SE xd n The critical value t* depends on the particular confidence level, C, that you specify and on the degrees of freedom, n – 1, which is based on the number of pairs, n. Confidence Intervals for Matched Pairs Example: Ten different families are tested for the number of gallons of water a day they use before and after viewing a conservation video. Construct a 90% confidence interval for the mean of the difference of the "before" minus the "after" times. Before 33 33 38 33 35 35 40 40 40 31 After 34 28 25 28 35 33 31 28 35 33 Before – After -1 5 13 5 0 2 9 12 5 -2 d = population mean difference in number of gallons for the ten families before and after viewing the video Step 1: Check Assumptions/Conditions • We will assume a SRS • is unknown, so we will use t-distribution • Populations are dependent Based on the linearity of the Normal Probability plot, we have an approximately normal distribution. Confidence Intervals for Matched Pairs Example: Ten different families are tested for the number of gallons of water a day they use before and after viewing a conservation video. Construct a 90% confidence interval for the mean of the difference of the "before" minus the "after" times. Before 33 33 38 33 35 35 40 40 40 31 After 34 28 25 28 35 33 31 28 35 33 Before – After -1 5 13 5 0 2 9 12 5 -2 Step 2: Calculate Confidence Interval sd 5.245 xd t * 4.8 1.833 1.760, 7.840 n 10 Confidence Intervals for Matched Pairs Example: Ten different families are tested for the number of gallons of water a day they use before and after viewing a conservation video. Construct a 90% confidence interval for the mean of the difference of the "before" minus the "after" times. Before 33 33 38 33 35 35 40 40 40 31 After 34 28 25 28 35 33 31 28 35 33 Before – After -1 5 13 5 0 2 9 12 5 -2 Step 3: Conclusion We are 90% confident that the population mean difference of number of gallons of water used “before” versus “after” viewing the conservation video is between 1.760 and 7.840.