Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Chapter 7. Inference for the Mean of a Population 7.1.Inference for the mean of a Population Both confidence intervals and tests of significance for the mean of a normal population are based on the sample mean, which estimates the unknown The sampling distribution of sample mean depends on . This fact causes no difficulty when is known. When is unknown, however, we must estimate even though we are primarily interested in The sample standard deviation s is used to estimate the population standard deviation The t distributions Suppose that an random sample of size n is drawn from an N(, 2) population. Then, the one-sample t statistic t= x—  s/ n has the t distribution with n-1 degrees of freedom. 1.Generally, we use t-distribution instead of standard normal distribution to calculate confidence interval and carry out test of hypothesis about the unknown mean when the standard deviation (or variance 2) of population is unknown. 2. You can calculate degree of freedom as n-1 in one sample problem(confidence interval and test of hypothesis). 7.2 Comparing Two Means Notations : Population 1 : mean Â1 , variance È 12 Population 2 : mean Â2 , variance È 22 2 Sample from population 1 : sample size n1 , mean x— 1 , variance s 1 Sample from population 2 : sample size n 2 , mean x—2 , variance s 22 Properties of the sampling distribution of E X—1 X—2 a  X— X—= Â1 Â2 1) 1 1 2) Var X— X— 1 X—2 2 2 X—2 a È X— 1 È 21 È 22 ƒ X—= n1 n2 2 3) When n1 and n 2are large enough, the distribution of X— 1 X—2 is approximately normal. Case 1 : Test and confidence interval for a difference between two normal population means known È 1 and È 2 Null hypothesis : Ho : Â1 x—1 x—2 - o z= Test statistic : È 21 / n1 ƒ È 22 / n 2 Alternative Hypothesis H a : Â1 Â2 …H a : Â1 Â2 „ H a : Â1 Â2 ` - 0 0 0 Â2 = - o Rejection region for level test ze z· zd z· z· either z e z · / 2or z d Example 1 : Analysis of a random sample consisting of n1 =20 specimens of cold-rolled steel to determine yield strengths resulted in a sample average strength of x—= 29.8 ksi. A second random sample of n2 =25 two-side galvanized steel y = 34.7 — specimens gave a sample average strength of ksi. Assuming that the two yield-strength distributions are normal with È 1 = 4.0 and È 2 = 5.0 , does /2 the data indicate that the corresponding true average yield strengths Â1 and Â2 are different? Let's carry out a test at significance level =0.01. 1.Parameter interest : Â1 Â2 the difference between the true average strengths for the two types of steel 2.Null hypothesis : Ho : Â1 Â2 = 0 3.Alternative hypothesis : Ha : Â1 Â2 ` 0 x— y— - 0 = 0, z = 4.Test statistic : With È 1 / n1 ƒ È 22 / n.2 5.Rejection region : The form of Ha implies use of a two-tailed test with rejection region either z e z 0.005= 2.58 or z d z 0.005= 2.58 . 29.8 34.7 6.The computed value z= 16.0 / 20ƒ 25.0 / 25 = -3.66. Since -3.66<-2.58, the computed z does fall in the lower tail of the rejection region. Ho is therefore rejected at level 0.01 in favor of the conclusion that Â1 ` Â2 . The sample data strongly suggests that the true average yields strength for cold-rolled steel differs from that galvanized steel. Confidence Interval : When both population distributions are normal, standardizing X— 1 X—2 gives a random variable Z with a standard normal distribution. Since the area under the z curve between z · / 2 and z · / 2 is 1 · , it follows that X—1 X—2 Â1 Â2 P z · /2 „ „ z· /2 = 1 · , equivalently È 12 / n1 ƒ È 22 / n2 È 21 È 22 ƒ „ Â1 Â2 „ X—1 n1 n2 This implies that a 100(1-)% CI for Â1 Â2 is P X—1 X—2 x— 1 z · /2 x—2 z· /2 È 21 È 22 ƒ , x— 1 n1 n2 X—2 ƒ z · / 2 x—2 ƒ z · /2 Case 2 : Two-Sample t test and Confidence Interval 1)Assume È 1 ` È 2 È 21 È 22 ƒ =1 n1 n2 È 12 È 22 ƒ n1 n2 · The natural estimator of Â1 Â2 is still 1 estimator can be expressed as Var X— X— 1 X—2 , and the variance of this X—2 = È 12 È 22 ƒ . È 1 and È 2, n1 n2 however, are unknown so that we need to estimate both of them. Thus s 12 s 22 ‘ — — Var X 1 X 2 = ƒ a V 1 ƒ .V 2 n1 n 2 Null hypothesis : Ho : Â1 Â2 = x— x— - o 1 2 t= Test statistic : s 12 / n1 ƒ s 22 / n 2with V d.f.=truncated 1 ƒV 2 2 2 2 V 1 V 2 ƒ n1 1 n 2 1 Alternative Hypothesis H a : Â1 Â2 …H a : Â1 Â2 „ H a : Â1 Â2 ` - Confidence Interval : x— x—2 t · 1 o / 2, df 0 0 0 Rejection region for level test t e t · , df td t · , df either t e t · / 2 , dfor t d t · s 21 / n1 ƒ s 22 / n2 , x— 1 x—2 ƒ t · / 2,df / 2 , df s 12 / n1 ƒs 22 / n 2 Example 2 : Suppose that Â1 and Â2 are true mean stopping distances(ft) at 50 mph for cars of a certain type equipped with two different types of braking systems. Use the t test at Â2 „ 10 significance level 0.01 to test H o : Â1 Â2 = 10 vs H a : Â1 for n = 6 , x—2 = 115.7 , s 1 = 5.03 , n2 = 6 , x—2 = 129.3 , s 2 = 5.38. following data: 1 2) Assume È 1 = È 2. 2 Since È is the variance of both the X 1 distribution and the X 2 distribution, the best estimator should depend on both the Xi1's and Xj2's. Furthermore, more weight should be given to the sample corresponding to the 2 2 larger of the two sample sizes. Both S 1 and S, the two sample variances, are 2 2 estimators of È ; a better estimator than either one individually is the weighted average of the two, which is called to be the pooled estimator of the common 2 variance È , denoted by S 2 p= n1 2 2 1 S 1 ƒ n2 1 S 2 n1 ƒ n 2 2 Â2 = - o Null hypothesis : Ho : Â1 x— x—2 - o 1 t= 1 1 n ƒ n2 Pooled test statistic : sp ƒ with d.f. = 1 n1 n 2 te t · /2 , n 1 Alternative Hypothesis H a : Â1 Â2 …- 0 H a : Â1 Â2 „ - 0 H a : Â1 Â2 ` - 0 td t· / 2, n ƒ n ƒ n or 2 2 1 2 2 Rejection region for level test te t · , n ƒ n 2 td t· , n ƒ n 1 2 1 2 2 either 2 Pooled Confidence Interval : x— x—2 t · / 2, n 1 1 ƒn 2 2 sp 1 1 ƒ , x— 1 n1 n2 x—2 ƒ t · / 2, n 1 ƒn 2 2 1 1 sp ƒ n1 n 2 Case 3 : Paired t Test Xn , Yn The data consists of n independently selected pairs X 1 , Y 1 , X 2 , Y 2 , ... , with E X = Â1 and E Y = Â2. . Let D1 = X 1 Y 1 , D 2 = X 2 Y 2 , ... , D n = X n Y n , so the Di ' s are the differences within pairs. Then the Di 'sare assumed to be normally distributed 2 with variance È D . Null hypothesis : Ho : ÂD a E X Y = Â1 Â2 = - o — d - o — d and s D t= Test statistic : are the sample mean and s D / n ,where standard deviation of the d i 's . d.f.=n-1. Alternative Hypothesis H a : ÂD …- 0 H a : ÂD „ - 0 H a : ÂD ` - 0 Rejection region for level test te t · , n 1 td t · , n 1 either t e t · / 2 , n 1or t d t · / 2 , n 1 Confidence Interval : — d t· /2 , n 1 sD/ n , — dƒ t · /2 , n 1 sD / n Case 4 : Population Proportion Having presented methods for comparing the means of two different populations, we now turn to the comparison of two population proportions. The notation for this problem is an extension of the notation used in the corresponding one-population problem. Let X 1 ~ Bin n1 , Æ1 and X 2 ~ Bin n 2 , Æ2 with X and X 2 independent variables. Æ1 1 Æ1 Æ 1 Æ ƒ 2 . 2 Then, E P 1 P 2 = Æ1 Æ2 and Var P 1 P 2 = n1 n2 Æ1 and Æ2, however, are unknown so that we need to estimate them. Assuming that Æ1 = Æ2 = Æc , instead of separate samples of size n1 and n 2 from two different populations, we really have a single sample of size of n1 ƒ n 2 from one population with proportion Æc . Since the total number of individuals in this combined sample having the characteristic of interest is X 1 ƒ X ,2the estimator of Æc is X ƒX2 n1 n2 ‘ c a p c= 1 Æ = p 1ƒ p 2 n1 ƒ n 2 n1 ƒ n 2 n1 ƒ n 2 1 Null hypothesis : Ho : Æ1 Æ2 = 0 p1 p2 z= 1 1 Test statistic : p c 1 p c ƒ n1 n2 n and n are large enough . 2 when 1 Alternative Hypothesis Rejection region for level test ze z· H a : Æ1 Æ2 …0 zd z· H a : Æ1 Æ2 „ 0 z· / 2 H a : Æ1 ` Æ2 either z e z · / 2 or z d Confidence Interval: p1 1 p p 1 p 2 z · /2 n1 1 p2 p , p 1 p 2ƒ z · /2 n2 This CI is valid provided that ni Æi e 5and ni 1 Æi e 5, i=1,2 . 1 ƒ p 2 1 1 p n1 1 ƒ p 2 1 p n2 2