Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
1 Basic Ideas for the CI and Testing hypothesis: Suppose that x1, x2, · · · , xn is an SRS from a normal distribution N (µ1, σ1). Let x̄ be the sample mean and s1 be the sample standard deviation. (1) According to central limit theorem, we have x̄ − µ1 √ ∼ N (0, 1). σ1 / n (2) In an advanced probability course we can prove that x̄ − µ1 √ ∼ t(n − 1). s1/ n Suppose that y1, y2, · · · , ym is an SRS from a normal distribution N (µ2, σ2). Let ȳ be the sample mean and s2 be the sample standard deviation. (3) According to central limit theorem, we have (x̄ − ȳ) − (µ1 − µ2) p ∼ N (0, 1). 2 2 σ1 /n + σ2 /m (4) In an advanced probability course we can prove that (x̄ − ȳ) − (µ1 − µ2) p ∼ t((n − 1) ∧ (m − 1)). s21/n + s22/m (5) An random experiment which has two possible outcomes:success(S) and failure(F) and P(S) = p, P(F ) = 1 − p, is called Bernoulli trial. Let ξ be the number of successes observed. Then, ξ has binomial distribution b(p, n). Thus, ξ has mean np 2 p and standard deviation np(1 − p). According to CLT, if np ≥ 10 and n(1 − p) ≥ 10, we have p ξ − np np(1 − p) ∼ N (0, 1) (6) Suppose that pˆ1 is the number of successes observed in a sequence of n1 identical Bernoulli trials with unknown p1. Suppose that pˆ2 is the number of successes observed in a sequence of another n2 identical Bernoulli trials with unknown p2. Define number of successes in both samples combined . p̂ = number of individuals in both samples combined Then, pˆ1 − pˆ2 r ³ ´ ∼ N (0, 1) p̂(1 − p̂) n11 + n12 Chapter Eighteen: Population mean inference with unknown σ CI for the mean of a normal population: Draw an SRS of size n from a normal population having unknown mean µ and unknown standard deviation σ. A level C confidence interval for µ is s s x̄ + t∗ √ ], [x̄ − t∗ √ , n n where t∗ is the level C critical point from the t distribution t(n − 1), i.e. P(|t| ≤ t∗) = P({−t∗ ≤ t ≤ t∗}) = C. 3 For example, if C = 95% and n = 3, then t∗ = 4.303 from the Table C. In above CI, t∗ √σn is called the margin of error with C confidence. Example: A random sample of 10 high school students gains an average of x̄ = 22 points in their second attempt at the SAT mathematical exam. The change in score has a normal distribution with unknown standard deviation. The sample standard deviation is s = 20. (1) Find the 95% CI for µ. (2) Find the margin of error for 99% confidence. Solution: (1) The 95% CI for µ is s s 20 20 [x̄ − t∗ √ , x̄ + t∗ √ ] = [22 − 2.262 √ , 22 + 2.262 √ ] n n 10 10 = [7.693855, 36.3061] (2) The the margin of error for 99% confidence is s 20 t∗ √ = 3.25 √ = 20.5548. n 10 Test procedure: Let x1, x2, · · · , xn be a SRS of size n from a normal distribution with unknown mean µ and unknown standard deviation σ . Define x̄ − µ0 t =: √ ∼ t(n − 1). s/ n (a) To test H0 : µ = µ0 versus Ha : µ > µ0 at the α level of significance, reject H0 if t ≥ tα , where P(t ≥ tα ) = α. 4 (b) To test H0 : µ = µ0 versus Ha : µ < µ0 at the α level of significance, reject H0 if t ≤ −tα , where P(t ≤ −tα ) = α. (c) To test H0 : µ = µ0 versus Ha : µ 6= µ0 at the α level of significance, reject H0 if t ≥ tα/2 or t ≤ −tα/2, where P(|t| ≥ tα/2) = α. The p-value is the smallest α at which we can reject H0. More precisely, we have p-value = P(t ≥ t0) + P(t ≤ −t0) for two sided test p-value = P(t ≥ t0) for one sided test Ha : µ > µ0 p-value = P(t ≤ −t0) for one sided test Ha : µ < µ0 where t ∼ t(n − 1) and the t0 is the observed value of the test statistic. Example: By past experience, we know that the daily yield of a chemical manufactured in a chemical plant has N (µ, σ) with unknown mean and standard deviation. The 20 day observed sample mean of the daily yields is x̄ = 871 tons and the sample standard deviation s = 21. Test the hypothesis that the average daily yield of the chemical is µ = 880 tons per day against the alternative µ 6= 880 using α = 0.05. (1) H0 : µ = 880 Ha : µ 6= 880 (2) Test statistic: t := x̄ − 880 √ ∼ t(19) 21/ 20 (3) significance level α = 0.05. 5 (4) If H0 is true, then P(−2.093 ≤ X̄ − 880 √ ≤ 2.093) = 0.95 21/ 20 Thus, the rejection region is R = (−∞, −2.093)∪(2.093, ∞). Since 871 − 880 √ = −1.9166 ∈ / R, 21/ 20 we do not reject the null hypothesis µ = 880 tons. Chapter Nineteen: Two-sample problems: Comparing two population means: CI for Difference of Two population Means When two independent SRSs of size n1 and size n2 observations were selected from two different normal populations with unknown means µ1 and µ2 and unknown variances σ12 and σ22, respectively, a level C CI for (µ1 − µ2) is given by s s ¸ · s21 s22 s21 s22 , + , x̄1 − x̄2 + tα/2 + x̄1 − x̄2 − tα/2 n1 n2 n1 n2 where x̄i and s2i , i = 1, 2, are the sample mean and the sample variance respectively from the ith population, and tα/2 is the t critical point with degree equal to the smaller of n1 − 1 and n2 − 1 such that P(|t| ≥ tα/2) = α. s s21 s22 SE := + n1 n2 6 is called the standard error. Example: A small amount of the trace element selenium, 50 − 200 micrograms (µg) per day, is considered essential to good health. Suppose that independent random samples of n1 = n2 = 30 adults were selected from two regions of the United States and that a day’s intake of selenium, from both liquids and solids, was recorded for each person. The mean and standard deviation of the selenium daily intakes for the 30 adults from region 1 were x̄1 = 167.1 and s1 = 24.3µg, respectively. The corresponding statistics for the 30 adults from region 2 were x̄2 = 140.9 and s2 = 17.6. Suppose that the population for the selenium daily intakes in each region has a normal distribution. Find a 95% CI for the difference in the mean selenium intakes for the two regions. Interpret this interval. Solution: From the information, n1 = n2 = 30, x̄1 = 167.1, x̄2 = 140.9, s1 = 24.3, s2 = 17.6. Thus, the 95% CI for µ1 − µ2 is approximately · ¸ q 2 q 2 s1 s22 s1 s22 x̄1 − x̄2 − tα/2 n1 + n2 , x̄1 − x̄2 + tα/2 n1 + n2 = [14.99752, 37.40248], where tα/2 = 2.045 with degree equal to 29. In repeated sampling, 95% of all of intervals constructed in this manner will enclose µ1 − µ2. We are fairly certain that this particular interval encloses µ1 − µ2. Test Hypothesis for Difference of Two Population Means When two independent SRSs of size n1 and size n2 7 observations were selected from two different normal populations with unknown means µ1 and µ2 and unknown variances σ12 and σ22, respectively, we have following test procedure: Let (x̄1 − x̄2) − D0 t= q 2 ∼ t((n1 − 1) ∧ (n2 − 1)) s22 s1 n1 + n2 where x̄1 and x̄2 are the observed sample means and s1 and s2 are the observed sample standard deviations from two populations, respectively. (a) To test H0 : µ1 − µ2 = D0 versus H1 : µ1 − µ2 > D0 at the α level of significance, reject H0 if t ≥ tα , where P(t ≥ tα ) = α. (b) To test H0 : µ1 − µ2 = D0 versus H1 : µ1 − µ2 < D0 at the α level of significance, reject H0 if t ≤ −tα , where P(t ≤ −tα ) = α. (c) To test H0 : µ1 − µ2 = D0 versus H1 : µ1 − µ2 6= D0 at the α level of significance, reject H0 if either t ≥ tα/2 or t ≤ −tα/2, where P(|t| ≥ tα/2) = α. Example: Allstate and Roadkill specialize in writing insurance policies for high-risk drivers. Last year, Allstate processed 100 claims. Settlements averaged $2000 and had a sample standard deviation of $600. A smaller firm, Roadkill resolved only 50 claims, but the payouts averaged $2500 with a sample standard deviation of $700. Suppose that claims for each company have a normal distribution. Can we conclude from last year’s experience that the average awards paid by two companies tend not to be the same? Set up and carry out 8 an appropriate analysis. Solution: Suppose that µ1 is the true mean of payouts in Allstate and µ2 is the true mean of payouts in Roadkill. Let x̄1 represent the sample mean of payouts in Allstate and x̄2 represent the sample mean of payouts in Roadkill. (1). Let α = 0.1. (2). Set H0 : µ1 − µ2 =: D0 = 0 versus H1 : µ1 6= µ2, where n = 100 and m = 50. (3). t= = (x̄1 −x̄2 )−D0 r s21 s22 n1 + n2 (2000−2500) q (600)2 (700)2 100 + 50 = −4.3193 Since the p-value P(|t| > 4.319) ≤ 0.1%, we reject H0. This means that we may conclude from last year’s experience that the average awards paid by two companies tend not to be the same.