Download Confidence intervals Let θ be a population parameter of interest

Probability and Statistics Grinshpan Confidence intervals Let θ be a population parameter of interest, and let θ̂ be an estimator for θ calculated from a random sample X1 , . . . , Xn . If θ̂ is known to be unbiased, bias(θ̂) = E[θ̂] − θ = 0, and consistent, in the sense that θ̂ approaches θ in probability as n → ∞, then every observation data x1 , . . . , xn yields a point estimate θ̂(x1 , . . . , xn ) of θ. The mean squared error of θ̂, which in this case agrees with its variance, MSE(θ̂) = E[(θ̂ − θ)2 ] = Var(θ̂) + bias2 (θ̂) = Var(θ̂), serves as an accuracy measure. If little is known about the population, this is already informative. If more is known about the population distribution, then interval estimation may be possible. Given 0 < α < 1, a confidence interval for θ, with a confidence level of 100(1 − α) %, is a pair of statistics, L(X1 , . . . , Xn ) and R(X1 , . . . , Xn ), such that the probability that L < θ < R is at least 1 − α. For instance, if α = 0.1, and a large number of n–samples is observed, then one expects about 90% of the intervals (L(x1 , . . . , xn ), R(x1 , . . . , xn )) to contain θ. We now examine an important special case where the sampling distribution is normal. Suppose first that Xi ∼ N (µ, σ 2 ), where µ is to be estimated and σ 2 is known. Then the sample mean X = (X1 + . . . + Xn )/n is also normally distributed, with mean µ and variance σ 2 /n. Fix a confidence level of 100(1 − α) %. Since X − µ P √ < z = Φ(z) − Φ(−z) = 2Φ(z) − 1, σ/ n α to meet the set confidence level, we should require that z satisfies 2Φ(z) − 1 = 1 − α or Φ(z) = 1 − . 2 If we define zα to be such value that Φ(zα ) = 1 − α, then z = z α2 . So then σ α √ P |X − µ| < z 2 = 1 − α. n σ σ contains µ with probability 1 − α. The center Consequently, the interval X − z α2 √ , X + z α2 √ n n σ of the interval is a random variable, X. The width of the interval, 2z α2 √ , is within our control at the n expense of the coverage probability. EXAMPLE Suppose that the population data is normally distributed, with σ = 0.4, and that, based on a sample of size 20, the sample mean is x = 3.84. Then, for √ a confidence level of, say, 95% (α = 0.05), we calculate or look up z0.025 ≈ 1.96 and evaluate z0.025 · σ/ 20 ≈ 0.175. The resulting bounds are 3.84 − 0.175 = 3.665 and 3.84 + 0.175 = 4.015. Thus (3.665, 4.015) is (a particular observation of) a confidence interval for the population mean µ. It “contains” µ with a probability of 0.95. Now let X1 , . . . , Xn be a simple random sample from a large population data, with both µ and σ 2 being unknown. How to construct a confidence interval for µ in this case? When n is of sufficient size, one course of action is normal approximation. It results in approximate 100(1 − α)% confidence intervals. σ2 N − n , and proceeding as above, one finds that (to n N −1 X − z α2 σX , X + z α2 σX contains µ with probability 1 − α. 2 2 In fact, using that X ≈ N (µ, σX ), where σX = some approximation) the random interval s2 n 1− is an unbiased and consistent n N 2 estimator for σX . Both the center and width of the approximate 100(1 − α)% confidence interval X − z α2 sX , X + z α2 sX are random variables. Other substitutes for σX are also possible. In practice, σX is often replaced by sX , since s2X = EXAMPLE Suppose that we observe 19 successes in 100 independent trials, with a probability of p(1 − p) success p, and would like to calculate a 95% confidence interval for p. Using that X ≈ N p, , n we have (for sufficiently large n) r r p(1 − p) p(1 − p) X − 1.96 < p < X + 1.96 . n n Note that p X = 0.19 is the maximum likelihood estimate for p. Replacing p by X = 0.19 to evaluate p(1 − p) 1.96 · ≈ 0.077, we find the approximate 95% confidence interval for p to be (0.113, 0.267). 10 Thus, to some degree of approximation, “p, with 95% confidence, is between 11% and 27%”.

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download Confidence intervals Let θ be a population parameter of interest