* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Chapter 10 – Two-Sample Inference
History of statistics wikipedia , lookup
Psychometrics wikipedia , lookup
Degrees of freedom (statistics) wikipedia , lookup
Taylor's law wikipedia , lookup
Bootstrapping (statistics) wikipedia , lookup
German tank problem wikipedia , lookup
Misuse of statistics wikipedia , lookup
Chapter 10 – Two-Sample Inference Independent Samples and Dependent Samples o Two samples are independent when the subjects selected for the first sample do not determine the subjects in the second sample. Two samples are dependent when the subjects in the first sample determine that subjects in the second sample. The data from dependent samples are called matchedpair or paired samples. Confidence Interval for Population Mean Difference (Dependent Samples) o Suppose we have a set of matched-pair data obtained by taking dependent random samples of two populations and finding the differences to produce ( a random sample of the difference between the populations. A ) confidence interval for , the population mean of the differences, is given by lower bound = ̅ ( ) upper bound = ̅ ( ) √ √ where ̅ and represent the sample mean and sample standard deviation of the differences, respectively, of the set of n paired differences, d1, d2, d3,…, dn, and where is based on n – 1 degrees of freedom. This t interval applies whenever either of the following condition is met: Case 1: the population of difference is normal, or Case 2: the sample size of difference is large (n ≥ 30). ( The form: ) confidence interval for may also be expressed in the ̅ ( ) √ Example 10.1 Student After (sample 1) Ashley 66 Brittany 68 Chris 74 Dave 88 Emily 89 Fran 91 Greg 100 50 55 60 70 75 80 88 Before (sample 2) Q1. Construct a 95% confidence interval for the mean of the differences in the statistics quiz scores. Is there evidence that the Math Center tutoring leads to a mean improvement in the quiz scores? A1. We ignore the original raw data and concentrate only on the set of sample differences: {16, 13, 14, 18, 14, 11, 12} 𝒔𝒅 = (𝒙 𝒏 𝒙 )𝟐 = 𝟏 (𝟏𝟔 𝟏𝟒)𝟐 (𝟏𝟑 𝟏𝟒)𝟐 (𝟏𝟑 𝟏𝟒)𝟐 (𝟏𝟖 𝟏𝟒)𝟐 𝟕 𝟏 (𝟏𝟒 𝟏𝟒)𝟐 (𝟏𝟏 𝟏𝟒)𝟐 (𝟏𝟐 𝟏𝟒)𝟐 ≈ 𝟐. 𝟑𝟖𝟎𝟓 For 95% confidence with n – 1 = 6 degrees of freedom, 𝒕𝒂/𝟐 = 𝟐. 𝟒𝟒𝟕 𝒔 lower bound = 𝒙𝒅 𝒕𝒂 ( 𝒅 ) = 14 – (2.447)(2.3805/√𝟕) ≈ 11.7983 upper bound = 𝒙𝒅 𝒕𝒂 ( 𝒅 ) = 14 + (2.447)(2.3805/√𝟕) ≈ 6.2017 𝟐 √𝒏 𝒔 𝟐 √𝒏 We are 95% confident that the population mean of the differences between quiz scores before and after visiting the Math Center lies between 11.7983 points and 16.2017 points. Paired Sample t Test for the Population Mean of the Difference μd : p-Value Method o Suppose we have a set of matched-pair data obtained by taking dependent random sample of two populations and finding the differences to produce a random sample of the difference between the populations. We can use the t test whenever either of the following conditions is met: Case 1: the population of difference is normal, or Case 2: the sample size of difference is large (n ≥ 30). Step 1 State the hypotheses and the rejection rule. Use one of the hypothesis test forms from Table 10.1. Null hypothesis H0 : μd ≤ 0 H0 : μd ≥ 0 H0 : μd = 0 Alternative hypothesis Ha : μd > 0 Ha : μd < 0 Ha : μd ≠ 0 Type of test Right-tailed test Left-tailed test Two-tailed test Table 10.1 – Forms of the hypothesis test Step 2 Find tdata. = /√ Step 3 Find the p-value. Type of hypothesis Test p-Values is tail area associated with tdata Right-tailed test H0: μd ≤ 0 versus Ha: μd > 0 p-value p-value = P(t > tdata) Area to right of tdata 0 tdata Left-tailed test H0: μd ≥ 0 versus Ha: μd < 0 p-value = P(t < tdata) Area to left of tdata Two-tailed test H0: μd = 0 versus Ha: μd ≠ 0 p-value = P(t > | |) + P(t < -| |) = 2 * P(t > | |) Sum of the two tail areas. p-value tdata 0 Sum of two areas is p-value |𝑡𝑑𝑎𝑡𝑎 | 0 |𝑡𝑑𝑎𝑡𝑎 | Step 4 State the conclusion and interpretation. Compare the p-value with Example 10.2 Q1. Paired-sample t test for μd: The p-value method A1. pg.549 Paired Sample t Test for the Population Mean of the Difference μd : Critical Value Method o Suppose we have a set of matched-pair data obtained by taking dependent random sample of two populations and finding the differences to produce a random sample of the difference between the populations. You can use the t test whenever either of the following conditions is met: Case 1: the population of difference is normal, or Case 2: the sample size of difference is large (n ≥ 30). Step 1 State the hypotheses. Use one of the hypothesis test forms from Table 10.1. State clearly the meaning of μd. Step 2 Find tcrit, and state the rejection rule. To find tcrit, use the t table and degrees of freedom n – 1. To find the rejection rule, use Table 10.2. Form of test Right-tailed Left-tailed Tow-tailed H0: μd ≤ 0 vs. Ha: μd > 0 H0: μd ≥ 0 vs. Ha: μd < 0 H0: μd = 0 vs. Ha: μd ≠ 0 Rejection rules: “Reject H0 if…” tdata > tcrit tdata < –tcrit tdata > tcrit or tdata < –tcrit Table 10.2 – Rejection rules for the t test for μd Step 3 Find tdata. = /√ Step 4 State the conclusion and interpretation. Compare the tdata with tcrit. Example 10.3 Q1. Paired-sample t test for μd :The critical value method A1. pg.551 Sampling Distribution of ̅ 1 – ̅ 2 o When random samples are drawn independently from two populations with population means μ1 and μ2, and either Case 1: the two populations are normally distributed, or Case 2: the two sample sizes are large (at least 30), then the quantity = ( ) ( ) = ( ) ( ) approximately follows a t distribution with degrees of freedom equal to the smaller of n1 – 1 and n2 – 1, where ̅ and s1 represent the mean and standard deviation of the sample taken from population 1, and ̅ and s2 represent the mean and standard deviation of the sample taken from population 2. Standard Error of ̅ 1 – ̅ 2 o The standard error ̅ ̅ of the statistic ̅ 1 – ̅ 2 is = It measures the size of the typical error in using ̅ 1 – ̅ 2 to estimate . Confidence Interval for o For two independent random samples taken from two populations with population means and , and 100(1 – )% confidence interval for is given by ( ) / The t interval applies whenever either of the following conditions is met: Case 1: both populations are normally distributed, or Case 2: both sample sizes are large. Margin of Error E o The margin of error for a 100(1 – )% confidence interval for given by E = / * (standard error) = / *( ) = / *√ is Example 10.4 Gender Sample size Sample mean body temperature Sample standard deviation Population mean body temperature n1 = 65 𝑥̅ = 98.394 S1 = 0.743 𝜇 =? n2 = 65 𝑥̅ = 98. S2 = 0.699 𝜇 =? Females (sample 1) Males (sample 2) 5 Summary statistics for female versus male body temperatures in 0F Q1. Calculate the standard error 𝑠𝑥̅ 𝑥̅ for estimating the difference in population mean body temperature between women and men. A1. 𝒔𝒙𝟏 𝒙𝟐 𝒔𝟐 𝒔𝟐𝟐 𝟏 𝒏𝟐 = √𝒏𝟏 𝟎.𝟕𝟒𝟑𝟐 𝟎.𝟔𝟗𝟗𝟐 𝟔𝟓 𝟔𝟓 =√ ≈ 𝟎. 𝟏𝟐𝟔𝟓 Q2. Find a 95% confidence interval for the difference in women’s and men’s population men body temperatures. A2. Both sample size are large, so the sampling distribution of 𝒙𝟏 𝒙𝟐 has a t distribution. We know the standard error 𝒔𝒙𝟏 𝒙𝟐 ≈ 𝟎. 𝟏𝟐𝟔𝟓. But we need to find 𝒕𝒂/𝟐 to use the formula for E. the require degrees of freedom is the smaller of n1 – 1 and n2 – 1, which are both equal to 65 – 1 = 64, so the degrees of freedom for 𝒕𝒂/𝟐 is also 64. This df = 64 is not listed in the t table, so we choose the next lowest df listed, 60. For 95% confidence, then, 𝒕𝒂/𝟐 = 2.00, and the margin of error is E = 𝒕𝒂/𝟐 * (𝒔𝒙𝟏 𝒙𝟐 ) ≈ (2.00)*(0.1265) = 0.253 The 95% confidence interval is then (𝒙𝟏 𝒙𝟐 ) 𝑬 = (𝟗𝟖. 𝟑𝟗𝟒 𝟗𝟖. 𝟏𝟎𝟓) 𝟎. 𝟐𝟓𝟑 = 𝟎. 𝟐𝟖𝟗 𝟎. 𝟐𝟓𝟑 = (𝟎. 𝟎𝟑𝟔, 𝟎. 𝟓𝟒𝟐) Sampling Distribution of ̂ ̂ o When two random samples are drawn independently from two populations, then the quantity (̂ ̂ ) ( ̂ ) ( ) (̂ ) = = ̂ ̂ √ has an approximately standard normal distribution when the following conditions are satisfied: x1 ≥ 5, (n1 – x1) ≥ 5, x2 ≥ 5, (n2 – x2) ≥ 5 Standard Error of ̂ ̂ o The standard error ̂ ̂ of the statistic ̂ ̂ is ̂ ̂ ̂ ̂ ̂ ̂ where ̂ = ̂ and ̂ = size of the typical error in using = ̂ . The standard error ̂ ̂ to estimate ̂ ̂ measures the . Confidence Interval for o For two independent random samples taken from two populations with population proportion and , a 100(1 – )% confidence interval for is given by ̂ ̂ ̂ ̂ ̂ ̂ / Margin of Error E o The margin of error for a 100(1 – )% confidence interval for given by = / ( )= / ( ̂ ̂ )= ̂ ̂ / is ̂ ̂ Example 10.5 Boys x1 = 195 Number responding “Yes” n1 = 487 Sample size ̂ 𝟏 = x1/ n1 = 195/487 ≈ 0.4004 𝒑 Sample proportion girls x2 = 93 n2 = 487 ̂ 𝟐 = x2/ n2 = 93/487 ≈ 0.1910 𝒑 Proportions of teenage boys and girls who post their last names in online profiles Q1. Find the point estimate 𝑝̂ ̂𝟏 𝒑 𝑝̂ for the difference in population proportions 𝑝 ̂𝟐 ≈ 𝟎. 𝟒𝟎𝟎𝟒 𝒑 𝑝 𝟎. 𝟏𝟗𝟏𝟎 = 𝟎. 𝟐𝟎𝟗𝟒 Q2. Calculate the standard error ̂𝟏 = 𝟏 𝒒 𝒔𝒑̂𝟏 ̂𝟐 𝒑 ̂𝟏 = 𝟏 𝒑 = ̂𝟏 𝒒 ̂𝟏 𝒑 𝒏𝟏 ̂𝟐 = 𝟏 𝟎. 𝟒𝟎𝟎𝟒 = 𝟎. 𝟓𝟗𝟗𝟔 and 𝒒 ̂𝟐 𝒒 ̂𝟐 𝒑 = 𝒏𝟐 (𝟎. 𝟒𝟎𝟎𝟒)(𝟎. 𝟓𝟗𝟗𝟔) 𝟒𝟖𝟕 ̂𝟐 = 𝟏 𝒑 𝟎. 𝟏𝟗𝟏𝟎 = 𝟎. 𝟖𝟎𝟗𝟎. (𝟎. 𝟏𝟗𝟏𝟎)(𝟎. 𝟖𝟎𝟗𝟎) ≈ 𝟎. 𝟎𝟐𝟖𝟓 𝟒𝟖𝟕 Q3. For a 95% confidence level, calculate the margin of error. 𝑬 = 𝒁𝒂/𝟐 (𝒔𝒑̂𝟏 ̂𝟐 ) 𝒑 = 𝟏. 𝟗𝟔(𝟎. 𝟎𝟐𝟖𝟓) ≈ 𝟎. 𝟎𝟓𝟓𝟗 Q4. Construct and interpret a 95% confidence interval for the difference in population proportions of girls and boys whose last name is posted to their online profile. ̂𝟏 𝒑 ̂𝟐 𝒑 𝑬 = (𝟎. 𝟒𝟎𝟎𝟒 𝟎. 𝟏𝟗𝟏𝟎) = (𝟎. 𝟏𝟓𝟑𝟓, 𝟎. 𝟐𝟔𝟓𝟑) 𝟎. 𝟎𝟓𝟓𝟗 = 𝟎. 𝟐𝟎𝟗𝟒 𝟎. 𝟎𝟓𝟓𝟗