Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Psychometrics wikipedia , lookup
Foundations of statistics wikipedia , lookup
Degrees of freedom (statistics) wikipedia , lookup
History of statistics wikipedia , lookup
Taylor's law wikipedia , lookup
Bootstrapping (statistics) wikipedia , lookup
German tank problem wikipedia , lookup
Statistical inference wikipedia , lookup
Resampling (statistics) wikipedia , lookup
Chapter 19: Two-Sample Problems STAT 1450 19.0 Two-Sample Problems Connecting Chapter 18 to our Current Knowledge of Statistics Population Parameter Point Estimate Confidence Interval ΞΌ (Ο known) π₯ π₯ ± π§β π π π§= π₯ β π0 π π ΞΌ (Ο unknown) s π₯ ± π‘β π π π‘= π₯ β π0 π π Test Statistic βΈ Remember that these formulas are only valid when appropriate simple conditions apply! 19.0 Two-Sample Problems Connecting Chapter 18 to our Current Knowledge of Statistics βΈ Matched pairs were covered at the end of Chapter 18. A common situation requiring matched pairs is when before-and-after measurements are taken on individual subjects. βΈ Example: Prices for a random sample of tickets to a 2008 Katy Perry concert were compared with the ticket prices (for the same seats) to her 2013 concert.. ο§ The data could be consolidated into 1 column of differences in ticket prices. ο§ A test of significance, or, a confidence interval would then occur for β1 sample of data.β 19.1 The Two-Sample Problem The Two-Sample Problems βΈ Two-sample problems require us to compare: ο§ the response to two treatments - or ο§ the characteristics of two populations. βΈ We have a separate sample from each treatment or population. 19.1 The Two-Sample Problem The Two-Sample Problem βΈ Example: Suppose a random samples of ticket prices for concerts by the Rolling Stones was obtained. For comparison purposes another random sample of Coldplay ticket prices was obtained. Note these are not necessarily the same seats or even the same venues. βΈ Question: Are these samples more likely to be independent or dependent? a) Independent b) Dependent c) Not sure 19.1 The Two-Sample Problem The Two-Sample Problem βΈ Example: Suppose a random samples of ticket prices for concerts by the Rolling Stones was obtained. For comparison purposes another random sample of Coldplay ticket prices was obtained. Note these are not necessarily the same seats or even the same venues. βΈ Question: Are these samples more likely to be independent or dependent? a) Independent b) Dependent c) Not sure 19.1 The Two-Sample Problem Two-Sample Problems βΈ The end of Chapter 18 described inference procedures for the mean difference in two measurements on one group of subjects (e.g., pulse rates for 12 students before-and-after listening to music). βΈ Given our answer from above, and the likelihood that each sample has different sample sizes, variances, etcβ¦ Chapter 19 focuses on the difference in means for 2 different groups. Population Parameter Point Estimate π1 β π2 π₯1 β π₯2 Confidence Interval Test Statistic 19.2 Comparing Two Population Means Sampling Distribution of Two Sample Means βΈ Recall that for a single sample mean π₯ ο§ The standard deviation of a statistic is estimated from data the result is called the standard error of the statistic. ο§ The standard error of π₯ is π π . Inference in the two-sample problem will require the standard error of the difference of two sample means ππ β ππ . x12 19.2 Comparing Two Population Means Sampling Distribution of Two Sample Means βΈ The following table stems from the above comment on standard error and statistical theory. Variable Parameter Point Estimate Population Standard Deviation Standard Error x1 m1 π₯1 s1 π 1 π1 x2 m2 π₯2 s2 π 2 π2 π₯1 β π₯2 π12 π22 + π1 π2 π 12 π 22 + π1 π2 Diff = x1 - x2 m1 - m2 19.2 Comparing Two Population Means Example: SSHA Scores βΈ The Survey of Study Habits and Attitudes (SSHA) is a psychological test designed to measure various academic behaviors (motivation, study habits, attitudes, etcβ¦) of college students. Scores on the SSHA range from 0 to 200. The data for random samples 17 women (**the outlier from the original data set was removed**) and 20 men yielded the following summary statistics. βΈ Is there a difference in SSHA performance based upon gender? 19.2 Comparing Two Population Means Example: SSHA Scores βΈ Summary statistics for the two groups are below: Group Sample Mean Sample Standard Deviation Sample Size Women** 139.588 20.363 17 Men 122.5 32.132 20 ο§ There is a difference in these two groups. The womenβs average was 17.5 points > than the menβs average. 19.2 Comparing Two Population Means Example: SSHA Scores βΈ Summary statistics for the two groups are below: Group Sample Mean Sample Standard Deviation Sample Size Women** 139.588 20.363 17 Men 122.5 32.132 20 ο§ There is a difference in these two groups. The womenβs average was 17.5 points > than the menβs average. ο§ Yet, the standard deviations are larger than this sample difference, and the sample sizes are about the same. 19.2 Comparing Two Population Means Example: SSHA Scores βΈ Summary statistics for the two groups are below: Group Sample Mean Sample Standard Deviation Sample Size Women** 139.588 20.363 17 Men 122.5 32.132 20 ο§ There is a difference in these two groups. The womenβs average was 17.5 points > than the menβs average. ο§ Yet, the standard deviations are larger than this sample difference, and the sample sizes are about the same. ο§ Is this difference significant enough to conclude that πwomen is larger than πmen? 19.2 Comparing Two Population Means Example: SSHA Scores βΈ Summary statistics for the two groups are below: Group Sample Mean Sample Standard Deviation Sample Size Women** 139.588 20.363 17 Men 122.5 32.132 20 ο§ There is a difference in these two groups. The womenβs average was 17.5 points > than the menβs average. ο§ Yet, the standard deviations are larger than this sample difference, and the sample sizes are about the same. ο§ Is this difference significant enough to conclude that πwomen is larger than πmen? Letβs learn more! 19.3 Two-Sample t Procedures The Two-sample t Procedures: Derived βΈ Now that we have a point estimate and a formula for the standard error, we can conduct statistical inference for the difference in two population means. Chapter Parameter of Interest Point Estimate Standard Error 18 m (Ο unknown; 1-sample) π₯ π π 19 ΞΌ 1 - ΞΌ2 (Ο1, Ο2 unknown; 2-samples) Confidence Interval π₯ ± π‘β π π pt. estimate ± t*(standard error) π₯1 β π₯2 π 12 π 22 + π1 π2 19.3 Two-Sample t Procedures The Two-sample t Procedures: Derived βΈ Now that we have a point estimate and a formula for the standard error, we can conduct statistical inference for the difference in two population means. Chapter Parameter of Interest Point Estimate Standard Error 18 m (Ο unknown; 1-sample) π₯ π π 19 ΞΌ 1 - ΞΌ2 (Ο1, Ο2 unknown; 2-samples) Confidence Interval π₯ ± π‘β π π pt. estimate ± t*(standard error) π₯1 β π₯2 π 12 π 22 + π1 π2 (π₯1 β π₯2 ) ± t* π 12 π1 + π 22 π2 19.3 Two-Sample t Procedures The Two-sample t Procedures: Derived Chapter Parameter of Interest Point Estimate Standard Error Test Statistic 18 ΞΌ (Ο unknown; 1-sample) π₯ π π π₯ β π0 π‘= π / π m 1 - ΞΌ2 19 (Ο1, Ο2 unknown; 2-samples) π₯1 β π₯2 π 12 π 22 + π1 π2 pt. estimate β m0 standard error Note: H0 for our purposes will be that m1=m2; which is equivalent to there being a mean difference of β0.β 19.3 Two-Sample t Procedures The Two-sample t Procedures: Derived Chapter Parameter of Interest Point Estimate Standard Error Test Statistic 18 ΞΌ (Ο unknown; 1-sample) π₯ π π π₯ β π0 π‘= π / π m 1 - ΞΌ2 19 (Ο1, Ο2 unknown; 2-samples) π₯1 β π₯2 π 12 π 22 + π1 π2 pt. estimate β m0 standard error π‘= (π₯1 β π₯2 ) β 0 π 12 π 22 π1 + π2 Note: H0 for our purposes will be that m1=m2; which is equivalent to their being a mean difference of β0.β 19.3 Two-Sample t Procedures The Two-sample t Procedures βΈ Now we can complete the table from earlier: Population Parameter Point Estimate π1 β π2 π₯1 β π₯2 Confidence Interval Test Statistic t* is the critical value for confidence level C for the t distribution with df = smaller of (n1-1) and (n2-1). Find P-values from the t distribution with df = smaller of (n1-1) and (n2-1). 19.3 Two-Sample t Procedures The Two-sample t Procedures βΈ Now we can complete the table from earlier: Population Parameter π1 β π2 Point Estimate π₯1 β π₯2 Confidence Interval (π₯1 β π₯2 ) ± t* π 12 π1 + Test Statistic π 22 π2 t* is the critical value for confidence level C for the t distribution with df = smaller of (n1-1) and (n2-1). Find P-values from the t distribution with df = smaller of (n1-1) and (n2-1). 19.3 Two-Sample t Procedures The Two-sample t Procedures βΈ Now we can complete the table from earlier: Population Parameter π1 β π2 Point Estimate π₯1 β π₯2 Confidence Interval (π₯1 β π₯2 ) ± t* π 12 π1 + π 22 π2 Test Statistic π‘= (π₯1 β π₯2 ) β 0 π 12 π 22 + π1 π2 t* is the critical value for confidence level C for the t distribution with df = smaller of (n1-1) and (n2-1). Find P-values from the t distribution with df = smaller of (n1-1) and (n2-1). 19.3 Two-Sample t Procedures The Two-sample t Procedures: Confidence Intervals βΈ Draw an SRS of size n1 from a large Normal population with unknown mean π1 , and draw an independent SRS of size n2 from another large Normal population with unknown mean π2 . A level C confidence interval for π2 -π1 is given by (π₯1 β π₯2 ) ± t* π 12 π1 π 2 + π2 2 βΈ Here t* is the critical value for confidence level C for the t distribution with degrees of freedom from either Option 1(computer generated) or Option 2 (the smaller of n1 β 1 and n2 β 1). 19.3 Two-Sample t Procedures The Two-sample t Procedures: Significance Tests βΈ To test the hypothesis H0: ΞΌ1 - ΞΌ2 , calculate the two-sample t statistic π‘= (π₯1 β π₯2 ) π 12 π 22 π1 + π2 βΈ Find p-values from the t distribution with df = smaller of (n1-1) and (n2-1). 19.0 Two-Sample Problems Conditions for Inference Comparing TwoSample Means and Robustness of t Procedures βΈ The general structure of our necessary conditions is an extension of the one-sample cases. οΆ Simple Random Samples: ο§ Do we have 2 simple random samples? οΆ Population : Sample Ratio: ο§ The samples must be independent and from two large populations of interest. 19.0 Two-Sample Problems Conditions for Inference Comparing TwoSample Means and Robustness of t Procedures οΆ Large enough sample: Both populations will be assumed to be from a Normal distribution and ο§ when the sum of the sample sizes is less than 15, t procedures can be used if the data close to Normal (roughly symmetric, single peak, no outliers)? If there is clear skewness or outliers then, do not use t. ο§ when the sum of the sample sizes is between 15 and 40, t procedures can be used except in the presences of outliers or strong skewness. ο§ when the sum of the sample sizes is at least 40, the t procedures can be used even for clearly skewed distributions. 19.0 Two-Sample Problems Conditions for Inference Comparing TwoSample Means and Robustness of t Procedures βΈ Note: In practice it is enough that the two distributions have similar shape with no strong outliers. The two-sample t procedures are even more robust against non-Normality than the one-sample procedures. βΈ Now that we have a point estimate and a formula for the standard error, we can conduct statistical inference for the difference in two population means. 19.3 Two-Sample t Procedures Poll: SSHA Scores βΈ Suppose we have a goal of measuring the mean difference in SSHA between women and men. Which seems more plausible? a. µWomen-µMen = 0 (There is no difference.) b. µWomen - µMen β 0 (There is some difference.) 19.3 Two-Sample t Procedures Example: SSHA Scores βΈ The summary statistics for the SSHA scores for random samples of men and women are below. Use this information to construct a 90% confidence interval for the mean difference. Group Sample Mean Sample Standard Deviation Sample Size Women 139.588 20.363 17 Men 122.5 32.132 20 18.3 One-Sample t Confidence Intervals Example: 90% CI for SSHA Scores 1. Components 1. οΌDo we have two simple random samples? Yes. It was stated. οΌLarge enough population: sample ratio? Yes. NW > 20*17 = 340 NM > 20*20 = 400 οΌLarge enough sample? Yes. nW + nM =37 < 40 but outlier has been removed. No skewness. Steps for SuccessConstructing Confidence Intervals for m1 - m2 . Confirm that the 3 key conditions are satisfied (SRS?, N:n?, t-distribution?). 18.3 One-Sample t Confidence Intervals Example: 90% CI for SSHA Scores 2. Components. ππ = 139.588, sw = 20.363, nw = 17 ππ = 122.5, sm = 32.132, nm = 20 Steps for SuccessConstructing Confidence Intervals for m1 - m2 . 1. 2. 3. 4. 5. Confirm that the 3 key conditions are satisfied (SRS?, N:n?, t-distribution?). Identify the 3 key components of the confidence interval (means, s.ds., n1 , n2 ). Select t*. Construct the confidence interval. *Interpret* the interval. 18.3 One-Sample t Confidence Intervals Example: 90% CI for SSHA Scores 2. Components. ππ = 139.588, sw = 20.363, nw = 17 ππ = 122.5, sm = 32.132, nm = 20 3. Select t*. df =min{(nw -1), (nm -1)}=16 t*(90%, 16) = 1.746 Steps for SuccessConstructing Confidence Intervals for m1 - m2 . 1. 2. 3. 4. 5. Confirm that the 3 key conditions are satisfied (SRS?, N:n?, t-distribution?). Identify the 3 key components of the confidence interval (means, s.ds., n1 , n2 ). Select t*. Construct the confidence interval. *Interpret* the interval. 18.3 One-Sample t Confidence Intervals Example: 90% CI for SSHA Scores 2. Components. ππ = 139.588, sw = 20.363, nw = 17 ππ = 122.5, sm = 32.132, nm = 20 Steps for SuccessConstructing Confidence Intervals for m1 - m2 . 1. 2. 3. 4. 5. 3. Select t*. df =min{(nw -1), (nm -1)}=16 t*(90%, 16) = 1.746 Confirm that the 3 key conditions are satisfied (SRS?, N:n?, t-distribution?). Identify the 3 key components of the confidence interval (means, s.ds., n1 , n2 ). Select t*. Construct the confidence interval. *Interpret* the interval. 4. Interval. 139.588 β 122.5 ± 1.746 20.3632 17 + 32.1322 20 17.088 ± 15.222 = 1.866 to 32.31 18.3 One-Sample t Confidence Intervals Example: 90% CI for SSHA Scores 2. Components. ππ = 139.588, sw = 20.363, nw = 17 ππ = 122.5, sm = 32.132, nm = 20 Steps for SuccessConstructing Confidence Intervals for m1 - m2 . 1. 2. 3. 4. 5. 3. Select t*. df =min{(nw -1), (nm -1)}=16 t*(90%, 16) = 1.746 Confirm that the 3 key conditions are satisfied (SRS?, N:n?, t-distribution?). Identify the 3 key components of the confidence interval (means, s.ds., n1 , n2 ). Select t*. Construct the confidence interval. *Interpret* the interval. 4. Interval. 139.588 β 122.5 ± 1.746 20.3632 17 + 32.1322 20 17.088 ± 15.222 = 1.866 to 32.31 5. Interpret. We are 90% confident that the mean womenβs SSHA score is between 1.866 and 32.31 points higher than menβs. 19.3 Two-Sample t Procedures Example: SSHA Scores βΈ Letβs continue with this example by now conducting a test of significance for the mean difference in SSHA by gender at a=0.10. Does our decision align with the results from the earlier poll? Group Sample Mean Sample Standard Deviation Sample Size Women 139.588 20.363 17 Men 122.5 32.132 20 19.3 Two-Sample t Procedures Example: SSHA Scores State: Is there a difference in the mean SSHA scores between men and women? (i.e., mDiff β 0, mWomen β mMen β 0, mWomen β mMen ) Plan: a.) Identify the parameter. 19.3 Two-Sample t Procedures Example: SSHA Scores State: Is there a difference in the mean SSHA scores between men and women? (i.e., mDiff β 0, mWomen β mMen β 0, mWomen β mMen ) Plan: a.) Identify the parameter. mDiff =mWomen - mMen. b) List all given information from the data collected. 19.3 Two-Sample t Procedures Example: SSHA Scores State: Is there a difference in the mean SSHA scores between men and women? (i.e., mDiff β 0, mWomen β mMen β 0, mWomen β mMen ) Plan: a.) Identify the parameter. mDiff =mWomen - mMen. b) List all given information from the data collected. ππ = 139.588, sw = 20.363, nw = 17 ππ = 122.5, c) State the null (H0) and alternative (HA) hypotheses. sm = 32.132, nm = 20 19.3 Two-Sample t Procedures Example: SSHA Scores State: Is there a difference in the mean SSHA scores between men and women? (i.e., mDiff β 0, mWomen β mMen β 0, mWomen β mMen ) Plan: a.) Identify the parameter. mDiff =mWomen - mMen. b) List all given information from the data collected. ππ = 139.588, sw = 20.363, nw = 17 ππ = 122.5, c) State the null (H0) and alternative (HA) hypotheses. sm = 32.132, nm = 20 H0: mDiff = 0 Ha : mDiff β 0 19.3 Two-Sample t Procedures Example: SSHA Scores State: Is there a difference in the mean SSHA scores between men and women? (i.e., mDiff β 0, mWomen β mMen β 0, mWomen β mMen ) Plan: a.) Identify the parameter. mDiff =mWomen - mMen. b) List all given information from the data collected. ππ = 139.588, sw = 20.363, nw = 17 ππ = 122.5, c) State the null (H0) and alternative (HA) hypotheses. sm = 32.132, nm = 20 H0: mDiff = 0 Ha : mDiff β 0 d) Specify the level of significance. a =.10 e) Determine the type of test. Left-tailed Right-tailed Two-Tailed 19.3 Two-Sample t Procedures Example: SSHA Scores Plan: f) Sketch the region(s) of βextremely unlikelyβ test statistics. 19.3 Two-Sample t Procedures Example: SSHA Scores Solve: a) Check the conditions for the test you plan to use. ο§ Two Simple Random Samples? ο§ Large enough population: sample ratios? ο§ Large enough samples? 19.3 Two-Sample t Procedures Example: SSHA Scores Solve: a) Check the conditions for the test you plan to use. ο§ Two Simple Random Samples? Yes. Stated as a random sample. ο§ Large enough population: sample ratios? Yes. Both populations are arbitrarily large; much greater than, NW > 20*17 = 340; NM > 20*20 = 400 ο§ Large enough samples? Yes. nW + nM =37 < 40 outlier has been removed. No skewness. 19.3 Two-Sample t Procedures Example: SSHA Scores Solve: b) c) Calculate the test statistic t= π₯π€ βπ₯π π π€ 2 π π 2 + ππ€ ππ = Determine (or approximate) the P-Value. 19.3 Two-Sample t Procedures Example: SSHA Scores Solve: b) c) Calculate the test statistic t= π₯π€ βπ₯π π π€ 2 π π + ππ€ ππ = 2 Determine (or approximate) the P-Value. 139.588β122.5 20.3632 32.1322 + 20 17 = 17.088 8.719 = 1.96 19.3 Two-Sample t Procedures Example: SSHA Scores Solve: b) c) Calculate the test statistic t= π₯π€ βπ₯π π π€ 2 π π + ππ€ ππ = 2 Determine (or approximate) the P-Value. 139.588β122.5 20.3632 32.1322 + 20 17 1.96 ο§ 1.746 < 1.96 < 2.12 ο§ .05 < P-value < .10 P-value = 17.088 8.719 DF = 17 - 1 = 1.96 19.3 Two-Sample t Procedures Example: SSHA Scores Conclude: a) Make a decision about the null hypothesis (Reject H0 or Fail to reject H0). 19.3 Two-Sample t Procedures Example: SSHA Scores Conclude: a) Make a decision about the null hypothesis (Reject H0 or Fail to reject H0). Because the approximate P-value is smaller than 0.10, we reject the null hypothesis. b) Interpret the decision in the context of the original claim. 19.3 Two-Sample t Procedures Example: SSHA Scores Conclude: a) Make a decision about the null hypothesis (Reject H0 or Fail to reject H0). Because the approximate P-value is smaller than 0.10, we reject the null hypothesis. b) Interpret the decision in the context of the original claim. There is enough evidence (at a=.10) that there is a difference in the mean SSHA score between men and women. 19.3 Two-Sample t Procedures Example: SSHA Scores βΈ Letβs continue with this example by now conducting a test of significance for the mean difference in SSHA by gender at a=0.10. Does our decision align with the results from the earlier poll? ________ Group Sample Mean Sample Standard Deviation Sample Size Women 139.588 20.363 17 Men 122.5 32.132 20