Download Lecture 3 (May 8)

Ch7 Inference concerning means II Dr. Deshi Ye [email protected] Review Point estimation: calculate the estimated standard error s / n to accompany the point estimate x of a population. x  z / 2      x  z / 2   Interval estimation n n whatever the population, when the sample size is large, calculate the 100(1-a)% confidence interval for the mean When the population is normal, calculate the 100(1-a)% confidence interval for the mean s s x  t / 2  n    x  t / 2  n Where t / 2 is the obtained from t-distribution with n-1 degrees of freedom. 2 Review con. Test of Hypothesis 5 steps totally. Formulate the assertion that the experiment seeks to confirm as the alternative hypothesis P-value calculation the smallest fixed level at which the null hypothesis can be rejected. 3 Outline Inference concerning two means Design Issues – Randomization and Pairing 4 7.8 Inference concerning two means In many statistical problems, we are faced with decision about the relative size of the means of two or more populations. Tests concerning the difference between two means Consider two populations having the mean 1 and  2 and the variances of  1 and  2 and we want to test null hypothesis 1  2   Random samples of size n1 and n2 5 Two Population Tests Two Populations Mean Paired Proportion Variance Z Test F Test Indep. Z Test t Test t Test (Large sample) (Small sample) (Paired sample) 6 Testing Two Means Independent Sampling & Paired Difference Experiments Two Population Tests Two Populations Mean Paired Proportion Variance Z Test F Test Indep. Z Test t Test t Test (Large sample) (Small sample) (Paired sample) 8 Independent & Related Populations Independent Related 9 Independent & Related Populations Independent 1. Different Data Sources Related Unrelated Independent 10 Independent & Related Populations Independent 1. Different Data Sources Unrelated Independent Related 1. Same Data Source Paired or Matched Repeated Measures (Before/After) 11 Independent & Related Populations Independent 1. Different Data Sources Unrelated Independent Related 1. Same Data Source Paired or Matched Repeated Measures (Before/After) 2. Use Difference Between the 2 Sample Means X1 -X2 12 Independent & Related Populations Independent 1. Different Data Sources Unrelated Independent 2. Use Difference Between the 2 Sample Means X1 -X2 Related 1. Same Data Source Paired or Matched Repeated Measures (Before/After) 2. Use Difference Between Each Pair of Observations Di = X1i - X2i 13 Two Independent Populations Examples 1. An economist wishes to determine whether there is a difference in mean family income for households in 2 socioeconomic groups. 2. An admissions officer of a small liberal arts college wants to compare the mean SAT scores of applicants educated in rural high schools & in urban high schools. 14 Two Related Populations Examples 1. Nike wants to see if there is a difference in durability of 2 sole materials. One type is placed on one shoe, the other type on the other shoe of the same pair. 2. An analyst for Educational Testing Service wants to compare the mean GMAT scores of students before & after taking a GMAT review course. 15 Thinking Challenge Are They Independent or Paired? 1. Miles per gallon ratings of cars before & after mounting radial tires 2. The life expectancy of light bulbs made in 2 different factories 3. Difference in hardness between 2 metals: one contains an alloy, one doesn’t 4. Tread life of two different motorcycle tires: one on the front, the other on the back 16 Testing 2 Independent Means Two Population Tests Two Populations Mean Paired Proportion Variance Z Test F Test Indep. Z Test t Test t Test (Large sample) (Small sample) (Paired sample) 18 Test The test will depend on the difference between the sample means X 1  X 2 and if both samples come from normal population with known variances, it can be based on the statistic Z X1  X 2   (X X 1 2) 19 Theorem If the distribution of two independent random variables have the mean 1 and  2 and the variance  1 and  2 , then the distribution of their sum (or difference) has the mean 1  2 (or 1  2 ) and the variance  2   2 1 2 Two different sample of size  X2  1  12 n1  X2  2  22 n2 20 Statistic for test concerning different between two means Z ( X1  X 2 )    12 n1   22 n2 Is a random variable having the standard normal distribution. Or large samples Z ( X1  X 2 )   S12 S 22  n1 n2 21 Criterion Region for testing 1  2   Alternative hypothesis 1  2   1  2   1  2   Reject null hypothesis if Z   z Z  z Z   z / 2 or Z  z / 2 22 EX. To test the claim that the resistance of electric wire can be reduced by more than 0.05 ohm by alloying, 32 values obtained for standard wire yielded x1  0.136 ohm and s1  0.004 ohm , and 32 values obtained for alloyed wire yielded x2  0.083 ohm and s2  0.005 ohm Question: At the 0.05 level of significance, does this support the claim? 23 Solution 1. Null hypothesis: 1  2  0.05 1  2  0.05 Alternative hypothesis 2. Level of significance: 0.05 3. Criterion: Reject the null hypothesis if Z > 1.645 4. Calculation: z 0.136  0.083  0.05 2 (0.004) (0.005)  32 32 2  2.65 5. The null hypothesis must be rejected. 24 6. P-value: 1-0.996=0.04 < level of significance Critical values   0.05 One-sided alternatives -1.645 1.645 Two-sided alternatives -1.96 1.96   0.01 -2.33 2.33 -2.575 2.575 25 Type II errors To judge the strength of support for the null hypothesis when it is not rejected. Check it from Table 8 at the end of the textbook The size of two examples are not equal  12   22 n 2  1  22 n1  n2 26 Small sample size 2-sample t test. 2 2 ( X1  X 2 )   ( n  1) S  ( n  1) S 1 2 2 t , where S p2  1 n1  n2  2 1 1 Sp  n1 n2 27 1  2   Criterion Region for testing (Statistic for small sample ) Alternative hypothesis 1  2   1  2   1  2   Reject null hypothesis if T  t T  t T  t / 2 or T  t / 2 28 EX The following random samples are measurements of the heatproducing capacity of specimens of coal from two mines Question: use the 0.01 level of significance to test where the difference between the means of these two samples is significant. Mine 1 8260 8130 8350 8070 8340 Mine 2 7950 7890 7900 8140 7920 7840 29 Solution 1  2  0 1. Null hypothesis: 1  2  0 Alternative hypothesis 2. Level of significance: 0.01 3. Criterion: Reject the null hypothesis if t > 3.25 or 6340 t< -3.25 x  8230, x  7940, s   15750 1 4. Calculation: 2 2 1 4 54600 63000  54600 s22   10920, s 2p   13066.7 5 5 4 8230  7940 s p  114.31, t   4.19 1 1 114.31  5 6 5. The null hypothesis must be rejected. 6. P-value: 0.004 < level of significance 0.01 30 Calculate it in Minitab 31 Output Two-sample T for Mine 1 vs Mine 2 SE N Mean StDev Mean Mine 1 5 8230 125 56 Mine 2 6 7940 104 43 Difference = mu (Mine 1) - mu (Mine 2) Estimate for difference: 290.000 99% CI for difference: (133.418, 446.582) T-Test of difference = 0 (vs not =): T-Value = 4.19 P-Value = 0.02 DF = 9 32 SE mean: (standard error of mean) is calculated by dividing the standard deviation by the square root of n. StDev: standard deviation s1 . 33 Confidence interval 100(1-a)% confidence interval for x1  x2  t / 2 (n1  1) s12  (n2  1) s22 n1  n2  2 Where t / 2 is based on n1  n2  2 1 1  n1 n2 degrees of freedom. 34 CI for large sample x1  x2  z / 2 s12 s22  n1 n2 35 Matched pairs comparisons Question: Are the samples independent in the application of the two sample t test? For instance, the test cannot be used when we deal with “before and after” data, where the data are naturally paired. EX: A manufacturer is concerned about the loss of weight of ceramic parts during a baking step. Let the pair of random variables ( X i , Yi ) denote the weight before and weight after baking for the i-th specimen. 36 Statistical analysis Considering the difference Di  X i  Yi This collection of differences is treated as random sample of size n from a population having mean  D  0 : indicates the means of the two responses are the same Null hypothesis: H 0 :  D   D ,0 n D   D ,0 SD / n D , where D   Di i 1 n n , S D2  2 ( D  D )  i i 1 n 1 37 EX The following are the average weekly losses of worker-hours due to accidents in 10-industrial plants before and after a certain safety program was put into operation: Before 45 73 46 124 33 57 83 34 26 17 After 36 60 44 119 35 51 77 29 24 11 Question: Use the 0.05 level of significance to test whether the safety program is effective. 38 Solution D  0 Alternative hypothesis  D  0 1. Null hypothesis: 2. Level of significance: 0.05 3. Criterion: Reject the null hypothesis if t > 1.833 4. Calculation: t 5.2  0  4.03 4.08 / 10 5. The null hypothesis must be rejected at level 0.05. 39 6. P-value: 1-0.9985=0.0015 < level of significance Confidence interval A 90% confidence interval for the mean of a paired difference. Solution: since n=10 difference have the mean 5.2 and standard variance 4.08, s s x  t / 2     x  t / 2  n n 5.2  1.83  4.08 4.08   D  5.2  1.83  10 10 or 4.0   D  6.4 40 7.9 Design issues: Randomization and Pairing Randomization: of treatments prevents uncontrolled sources of variation from exerting a systematic influence on the response Pairing: according to some variable(s) thought to influence the response will remove the effect of that variable from analysis Randomizing the assignment of treatments within a pair helps prevent any other uncontrolled variables from influencing the responses in a systematic manner. 41

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download Lecture 3 (May 8)