Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Math/Stat 370: Engineering Statistics, Washington State University Haijun Li [email protected] Department of Mathematics Washington State University Week 6 Haijun Li Math/Stat 370: Engineering Statistics, Washington State University Week 6 1 / 22 Relation Between Populations and Samples 196 CHAPTER 6 RANDOM SAMPLING AND DATA DESCRIPTION Population µ σ Sample (x1, x2, x3,…, xn) x, sample average s, sample standard deviation Histogram Figure 6-3 Relationship between a population and a sample. Haijun Li x Math/Stat 370: Engineering Statistics, Washington State University x s Week 6 2 / 22 Point Estimation Point Estimator θ̂: A statistic that can be used to estimate the unknown parameter θ. Point estimate: A single value of θ̂. Bias: E(θ̂) − θ. Unbiased Estimator: E(θ̂) = θ. Example: 1 X = Pn i=1 n Xi is unbiased for mean µ: E(X ) = E 2 Haijun Li S2 = Pn i=1 (Xi −X ) n−1 2 Pn i=1 Xi n Pn = i=1 E(Xi ) n = nµ = µ. n is unbiased for variance σ 2 . Math/Stat 370: Engineering Statistics, Washington State University Week 6 3 / 22 may be different. Figure 7-1 illustrates the sit Minimum Variance Unbiased ˆ Estimator the estimator 1 is more likely to produce an ciple of estimation, when selecting among se Mean Square Error (MSE): θ̂) =variance. E(θ̂ − θ)2 . hasMSE( minimum MSE(θ̂) = V (θ̂) + (bias2 ). For unbiased estimator Definition θ̂: MSE(θ̂) = V(θ̂). we considerUnbiased all unbiased estimators o Minimum Variance UnbiasedIfEstimator: estimator with the smallest variance. called the minimum variance unbiased ^ Distribution of Θ 1 Figure 7-1 The sampling distributions of two unbiased estimaˆ and ˆ . tors 1 2 Haijun Li ^ Distribution of Θ 2 θ Math/Stat 370: Engineering Statistics, Washington State University Week 6 4 / 22 Examples Let X1 , . . . , Xn be i.i.d. with mean µ and variance σ 2 . Example 1: X is the minimum variance unbiased estimator for mean µ. Example 2: Compare the following two estimators X1 + X2 + X3 e 3X1 − X2 X = ,X = . 3 2 Solution: X is unbiased, and 3E(X ) − E(X ) 2µ 1 2 e = E X = = µ, 2 2 e is also unbiased. Compare their variances: and X 2 2 2 σ2 e ) = 3 V (X1 ) + (−1) V (X2 ) = 10σ . , V (X 3 22 4 Thus, X is better. V (X ) = Haijun Li Math/Stat 370: Engineering Statistics, Washington State University Week 6 5 / 22 Relative Efficiency Let θ̂1 , and θ̂2 be two estimators of θ (unbiased or not). If MSE(θ̂1 ) < MSE(θ̂2 ), then θ̂1 is more efficient than θ̂2 . Example: Let θ̂1 , θ̂2 , and θ̂3 be three estimators of θ with E(θ̂1 ) = E(θ̂2 ) = θ, E(θ̂3 ) 6= θ. V (θ̂1 ) = 16, V (θ̂2 ) = 11, MSE(θ̂3 ) = 6. Compare these estimators. Solution: Compare their MSEs: MSE(θ̂1 ) = 16, MSE(θ̂2 ) = 11, MSE(θ̂3 ) = 6. Thus, θ̂3 has the smallest mean square error, even though it is a biased estimator. Haijun Li Math/Stat 370: Engineering Statistics, Washington State University Week 6 6 / 22 Hypothesis Testing Statistical hypothesis: A statement about the parameters of one or more populations. Null Hypothesis H0 : The hypothesis we wish to test. Alternative Hypothesis H1 : One sided or two sided. Statistical Test 1 2 3 4 Form the null and alternative hypotheses. Take a sample from the population. Compute the test statistic from the sample. Based on the value of test statistic, make a decision about the null hypothesis (reject or not). Haijun Li Math/Stat 370: Engineering Statistics, Washington State University Week 6 7 / 22 Statistical Tests Type I error: Reject H0 when it is true. Type II error: Fail to reject H0 when it is false. Significance level (size) of the test: α = P(type I error). Type II probability: β = P(type II error). Power of the test: 1 − β = P(rejecting H0 when it is false) 9-1 HYPOTHESIS TESTING 281 Table 9-1 Decisions in Hypothesis Testing x e- Decision H0 Is True H0 Is False Fail to reject H0 Reject H0 no error type I error type II error no error Haijun Li Math/Stat 370: Engineering Statistics, Washington State University Week 6 8 / 22 A Motivating Example: The burning rate of solid propellant used to power aircrew escape systems is a random variable with normal distribution with SD of 2.5 cm/s. A sample of n = 10 items results in x = 51.7 cm/s. Construct a size α = 0.05 test for H0 : µ = 50 H1 : µ 6= 50. If x is “close” to 50, we should not reject H0 . If x is “far apart” from 50, say, x < a < 50 < b < x, we should reject H0 . Type I: α = P(x < a or x > b, when H0 is true) = 0.05. Type II: β = P(a ≤ x ≤ b, when H1 is true). Goal: Find the boundaries a and b based on α. Haijun Li Math/Stat 370: Engineering Statistics, Washington State University Week 6 9 / 22 Boundaries = Percentage points 1 2 3 Rejection (or Critical) region: z0 > zα/2 or z0 < −zα/2 . zα/2 is the 100α/2 percentage point. If α = 0.05, then z0.05/2 = z0.025 = 1.96. If α = 0.1, then z0.1/2 = z0.05 = 1.65. If290 α = 0.01, then z0.01/2 = z90.005 = 2.58. CHAPTER TESTS OF HYPOTHESES FOR N(0,1) Critical region Critical region α /2 –zα /2 Haijun Li α /2 Acceptance region 0 (a) zα /2 Math/Stat 370: Engineering Statistics, Washington State University Z0 Week 6 10 / 22 Inference on Mean (known σ 2) 9.qxd 5/15/02 8:02 PM Page 290 RK UL 9 RK UL 9:Desktop Fo Hypotheses: H0 : µ = µ0 VS H1 : µ 6= µ0 . Test the hypotheses with significant level α. Take a sample X1 , X2 , . . . , Xn . Test statistic X − µ0 √ Z0 = σ/ n Z0 has (approximately) standard normal distribution. 290 (or Critical) CHAPTER 9 TESTS HYPOTHESES FOR A SIN Rejection region: z0 > zOF α/2 or z0 < −zα/2 . N(0,1) Critical region Critical region α /2 – zα /2 Haijun Li α /2 Acceptance region 0 zα /2 Math/Stat 370: Engineering Statistics, Washington State University Z0 Week 6 11 / 22 Example: The burning rate of solid propellant used to power aircrew escape systems is a random variable with normal distribution with SD of 2 cm/s. A sample of n = 25 items results in x̄ = 51.3 cm/s. Construct a size α = 0.01 test for H0 : µ = 50 H1 : µ 6= 50. Solution: Since α/2 = 0.005, z0.005 = 2.58. Since Z0 = X − µ0 1.3 √ = = 3.25 > 2.58, 2/5 σ/ n we reject H0 at level α = 0.01. Haijun Li Math/Stat 370: Engineering Statistics, Washington State University Week 6 12 / 22 One-Sided Inference Based on Data X1 , X2 , . . . , Xn Test statistic X − µ0 √ σ/ n Z0 has (approximately) standard normal distribution. zα is the 100α percentage point. 9:Desktop Folder: Z0 = Hypotheses: H0 : µ = µ0 VS H1 : µ > µ0 (upper). Rejection region: z0 > zα . 2 Hypotheses: ESES FOR A SINGLE SAMPLE H0 : µ = µ0 VS H1 : µ < µ0 (lower). Rejection region: z0 < −zα . 1 N(0,1) N(0,1) cal region Critical region α Acceptance region Z0 0 Haijun Li zα α Z0 Acceptance region –zα Z0 0 Math/Stat 370: Engineering Statistics, Washington State University (b) (c) Week 6 13 / 22 p-Value p-value p: The smallest level of significance that leads to reject H0 based on data. If α ≥ p, we reject H0 at level α. For the normal distribution, 2(1 − Φ(|z0 |)) for two-tailed test (1 − Φ(z0 )) for upper tailed test p= Φ(z0 ) for lower tailed test Example: For the burning rate problem, z0 = 3.25, and p = 2(1 − Φ(3.25)) = 0.0012. Since α = 0.01 > p, we reject H0 at level 0.01. But if α = 0.001, then we fail to reject H0 at level 0.001 because α = 0.001 < 0.0012 = p. Haijun Li Math/Stat 370: Engineering Statistics, Washington State University Week 6 14 / 22 Type I error z1 48.5 50 1.90 0.79 and z2 Consider a two-sided hypotheses: 51.5 0.79 Therefore H0 : µ0 = 50 VS H1 : µ1 = 52. √ n = 10, σ = 2.5 and so σ/ n = 0.79. P1Z 1.902 P1Z 1.902 0.028717 0.028 Suppose that we reject H0 if X ≤ 48.5 or X ≥ 51.5 implies that 5.76% of all random samples would lead to re (rejectionThis region). : 50 per second H true mean burning ra 0 α = P(X ≤ 48.5 |µ centimeters = 50) + P(X ≥ 51.5when |µ =the50) per second. 48.5−50 51.5−50 = P(Z ≤ 0.79 ) + P(Z ≥ 0.79 ) = 0.028717 + 0.028717 = 0.057434. α /2 = 0.0287 α /2 = 0.0287 48.5 Haijun Li µ = 50 51.5 X Math/Stat 370: Engineering Statistics, Washington State University Week 6 15 / 22 Type II error Consider a two-sided hypotheses: 283 RK UL 9 RK UL 9:Desktop Folder: H0 : µ0 = 50 VS H1 : µ1 = 52. δ = 52 − 50 = 2. β = P(48.5 ≤ X ≤ 51.5 when µ = 52) = P( 48.5−50 ≤Z ≤ 0.79 51.5−50 ) = P(Z ≤ −0.63) − P(Z ≤ −4.43) = 0.2643. 9-1 HYPOTHESIS TESTING 283 0.79 0.6 0.6 0.5 Probability density 0.5 Probability density Under H1:µ = 52 Under H0: µ = 50 0.4 0.3 0.2 Under H1: µ = 50.5 0.4 0.3 0.2 0.1 0.1 0 46 48 50 52 – x HaijunFigure Li Under H0: µ = 50 9-3 54 56 0 46 48 50 52 54 56 –x 370: of Engineering Statistics, Figure Washington University 6 TheMath/Stat probability type II error 9-4StateThe probability of typeWeek II error 16 / 22 Type II error and Sample Size Consider a two-sided hypotheses: H 0 : µ = µ0 H1 : µ 6= µ0 . δ = µ − µ0 . For two-sided alternative, √ √ δ n δ n β = Φ zα/2 − − Φ −zα/2 − . σ σ (zα/2 + zβ )2 σ 2 n= . δ2 Haijun Li Math/Stat 370: Engineering Statistics, Washington State University Week 6 17 / 22 Examples 1 Example: The burning rate of solid propellant used to power aircrew escape systems is a random variable with normal distribution with SD of 2 cm/s. Let α = 0.05. Consider H0 : µ = 50, 2 VS. H1 : µ 6= 50. Suppose that sample size n = 25 and µ = 52. Find β. Solution: Since z0.025 = 1.96, β = Φ(1.96 − 5) − Φ(−1.96 − 5) ≈ Φ(−3.04) = 0.0012. Assume α = 0.05 and β = 0.10, find sample size n required to detect the difference δ = 52 − 50 = 2. Solution: Since z0.025 = 1.96, z0.1 = 1.28, n = 10.50. Haijun Li Math/Stat 370: Engineering Statistics, Washington State University Week 6 18 / 22 Confidence Interval 100(1 − α)% Confidence Interval (CI): A random interval (L, U) such that P(L ≤ µ ≤ U) = 1 − α. Observe that P − zα/2 X −µ √ ≤ zα/2 = 1 − α ≤ σ/ n where zα/2 is the 100α/2 percentage point. Rewrite this as zα/2 σ zα/2 σ P X− √ ≤µ≤X+ √ = 1 − α. n n Thus, a 100(1 − α)% CI for µ when σ 2 is known: zα/2 σ zα/2 σ x̄ − √ ≤ µ ≤ x̄ + √ . n n Haijun Li Math/Stat 370: Engineering Statistics, Washington State University Week 6 19 / 22 Confidence Bounds 252 CHAPTER 8 STATISTICAL INTERVALS FOR A SINGLE SAMPLE E = error = x – µ Figure 8-2 Error in estimating with x. l = x – zα /2 σ / n x µ u = x + zα /2 σ / n The length of a confidence interval is a measure of the prec preceeding discussion, we see that precision is inversely related to A 100(1 − α)% upper confidence bound for µ when σ 2 is sirable to obtain a confidence interval that is short enough for d known: that also has adequate confidence. One way to achieve this is by zα σ be large enough CI of µ ≤to x̄give + a√ . specified length or precision with n A 100(1 − of α)% lowerSize confidence bound for µ when σ 2 is 8-2.2 Choice Sample known: Haijun Li zα confidence σ The precision √ ≤ µ. interval in Equation 8-7 is 2z x̄ of − the using x to estimate n, the error E 0 x 0 is less than confidence 100(1 ). This is shown graphically in Fig.208-2. I Math/Stat 370: Engineering Statistics, Washington State University Week 6 / 22 Example Consider the propellant problem with SD of 2 cm/s. A sample of n = 25 items results in x̄ = 51.3 cm/s. 1 Find a 95% CI for mean µ. Solution: Since z0.025 = 1.96, 95% CI = (51.3 − 0.78, 51.3 + 0.78) = (50.32, 52.08). 2 Find a 95% upper confidence bound for mean µ. Solution: Since z0.05 = 1.65, 95% upper CB = 51.3 + 0.66 = 51.96. Haijun Li Math/Stat 370: Engineering Statistics, Washington State University Week 6 21 / 22 Sample Size The sample size required for the error |x̄ − µ| ≤ E with 100(1 − α)% confidence is given by n= z α/2 σ 2 E Example: Consider the propellant problem with σ = 2. Find the sample size required for the error to be less than 1.5 cm/s with 95% confidence. Solution: Since 1 − α = 0.95 and α = 0.05. Thus z0.025 = 1.96. Since E = 1.5, 2 1.96 × 2 n= = 6.83. 1.5 Haijun Li Math/Stat 370: Engineering Statistics, Washington State University Week 6 22 / 22