Download Lecture 7. Point estimation and confidence intervals

Lecture 7. Point estimation and confidence intervals Mathematical Statistics and Discrete Mathematics November 23rd, 2015 1 / 18 Motivating example • A producer of t-shirts for Chalmers wants us to provide them with the average height of a male student. • We do not have the resources to measure the height of every student. • We choose 10 students at random, and get the following heights in cm 182, 171, 177, 174, 186, 183, 193, 172, 180, 181 • We compute the mean of these values and obtain x̄ = (182 + 171 + . . . + 181)/10 = 179.9 • 179.9 is our estimate of the average height. Note that our estimate needs not be the true average height of all students. Disclaimer: I did not gather this data and I cannot guarantee its validity. 2 / 18 General framework of parameter estimation • We want to study a numerical property possessed by members of a certain large population, and it is impossible/impractical to gather data about the whole population. (Note that the population may be hypothetical, e.g. the population of all cellphone batteries - already produced batteries and batteries produced in the future). • The distribution of the property in the whole population is described by a random variable X, whose characteristics/parameters, like the mean or variance, we want to estimate/approximate. • We choose a (relatively) small random sample of n members of the population. We do it in such a way that the selection of one member does not influence the selection of any other member. • Before the actual choice, the property of the ith member is described by a random variable Xi which has the same distribution as X, and the variables X1 , X2 , . . . , Xn are independent. We call these variables a random sample too. 3 / 18 General framework of parameter estimation • By measuring the properties of the chosen members, we obtain n observed values (numbers) x1 , x2 , . . . , xn of the random variables X1 , X2 , . . . , Xn . We call the observed values a random sample too. • We use the observed values to estimate the parameters of interest. Note that we used the term random sample to denote three different notions: the randomly selected members of the population, the collection of random variables associated with these members, and the collection of the observed values of the variables. The interpretation of the term is usually clear from the context. 4 / 18 Random sample and statistics A random sample of size n from the distribution of X is a collection of n independent random variables X1 , X2 , . . . , Xn , each with the same distribution as X. A statistic is a random variable whose value can be be computed from the values of the random sample X1 , X2 , . . . , Xn . Typical statistics of interest are: Pn X i=1 Xi , Pn X X = i=1 Xi /n (sample mean), Pn 2 X i=1 Xi , X maxi Xi , X mini Xi . Note that if the mean µ of X is unknown, then, e.g. X − µ is not a statistic. 5 / 18 Estimators Let θ be a parameter of the distribution of X. A statistics used to estimate θ is called an estimator, and is denoted by θ̂. An estimate is the value of an estimator for a particular collection of observed values of a random sample. It is important to remember that an estimator is a random variable, and an estimate is a number. Examples of parameters are the mean value µ, variance σ 2 , standard deviation σ, and parameters λ, p for Poisson, exponential, and binomial distributions, etc. An estimator θ̂ is called unbiased if E[θ̂] = θ. The fact that an estimator is unbiased tells us that it is fluctuates around the right value. 6 / 18 Sample mean Let X1 , X2 , . . . , Xn be a random sample of size n from a distribution with mean µ. Then, the statistic Pn Xi X = i=1 n is called the sample mean. X is an unbiased estimator for µ, and hence we can write µ̂ = X. Proof. We have to prove that E[X] = µ. Using linearity of expectation, we have n n n n hX i 1 hX i 1X 1X Xi /n = E Xi = E[X] = E E[Xi ] = µ = µ. n n n i=1 i=1 i=1 i=1 X The sample mean estimate (or the observed value for X) for the height of a student from our first example is x̄ = 179.9 cm. 7 / 18 Sample mean Even though we know that the sample mean is unbiased, this does not give us much information about the accuracy of our estimates. It just tells us that we draw our estimates from a distribution with the right mean value. A desirable property of an estimator is that it has small variance for large sample sizes. Small variance implies that our estimates will be precise with large probability. Let X be the sample mean based on a random sample of size n from a distribution with mean µ and variance σ 2 . Then, Var[X] = σ2 . n Proof. Since variance is additive for independent random variables, we can write n n n n hX i hX i 1 1 X 1 X 2 σ2 Var[X] = Var Xi /n = 2 Var Xi = 2 Var[Xi ] = 2 σ = . n n n n i=1 i=1 i=1 i=1 The above implies that the larger the sample size, the larger the probability that our estimates are close to the true mean µ. 8 / 18 Sample variance Let X1 , X2 , . . . , Xn be a random sample of size n from a distribution of X. Then, the statistic Pn (Xi − X)2 S2 = i=1 n−1 is called the sample variance. S2 is an unbiased estimator for σ 2 . Furthermore, the √ 2 statistic S = S is called the sample standard deviation. Even though S2 is an unbiased estimator for σ 2 , S is not an unbiased estimator for σ. 9 / 18 Sample variance Proof. We have to prove that E[S2 ] = σ 2 . Note that S2 does not change if we add a constant to the variable X. This implies that it is enough to consider the case µ = E[X] = 0. Recall that E[XY] = E[X]E[Y] for independent random variables X and Y, and hence n n 1 1 h X i 1 X 1 σ2 Xi = E Xj Xi = E[Xj2 ] = E[X 2 ] = , and E[Xj · X] = E Xj n n n n n i=1 i=1 n n 1 h X i 1 X 1 σ2 σ2 E[X ] = E Xi X = E[Xi X] = · n · = . n n n n n 2 i=1 i=1 For µ = 0, we have n n h 1 X i hX i 1 2 E[S2 ] = E (Xi − X)2 = E Xi2 − 2Xi X + X n−1 n−1 i=1 = 1 n−1 n X i=1 i=1 2 E[Xi2 ] − 2E[Xi X] + E[X ] = n 1 X 1 2 1− σ = σ2 . n−1 n i=1 10 / 18 Sample variance It is usually more convenient to use the computational formula for the sample variance given by Pn 2 X 2 − nX S2 = i=1 i . n−1 X The sample variance estimate (or the observed value of S2 ) for the height of a student is s2 = 1822 + 1712 + . . . + 1812 − 10 · (179.9)2 = 45.43. 9 X The sample standard deviation estimate (or the observed value of S) for the height of a student is √ s = s2 = 6.74 cm. 11 / 18 Sample variance Proof of the computational formula. We have n S2 = n 1 X 1 X 2 2 Xi − 2XXi + X (Xi − X)2 = n−1 n−1 i=1 n i=1 n X 1 X 2 2 = Xi − 2X Xi + nX n−1 i=1 i=1 n 1 X 2 2 2 = Xi − 2nX + nX n−1 i=1 n 1 X 2 2 = Xi − nX . n−1 i=1 12 / 18 Interval estimation • Point estimates provide us with a number estimate of the parameter that we want to know but they do not contain any information about their accuracy. • One way of trying to introduce accuracy quantification into our considerations is to try to construct intervals that should contain the parameter of interest. 13 / 18 Confidence intervals Let X1 , X2 , . . . , Xn be a random sample of size n from a distribution of X, and let L and R be sample statistics such that L ≤ R. Let θ be a parameter of the distribution of X. The random interval [L, R] is called a confidence interval for the parameter θ with confidence level 1 − α (or 100%(1 − α)) if P(L ≤ θ ≤ R) = 1 − α. It is important to remember that θ is a parameter, and both L and R are random variables. 14 / 18 Confidence intervals for normal variables with known σ 2 Let X be a normal random variable with known variance σ 2 , and let X1 , X2 , . . . , Xn be a random sample of size n from the distribution of X. Let Z ∼ N (0, 1) be a standard normal variable, and let zα/2 > 0 be such that FZ (−zα/2 ) = α/2. Then, the random interval [L, R], where √ and L = X − zα/2 σ/ n √ R = X + zα/2 σ/ n is a confidence interval for the true mean µ with confidence level 1 − α, that is P(L ≤ µ ≤ R) = 1 − α. 15 / 18 Confidence intervals for normal variables with known σ 2 Proof. If X ∼ N (µ, σ 2 ), then by the property of the normal distribution σ2 X ∼ N µ, , n and hence X−µ √ ∼ N (0, 1) σ/ n is a standard normal variable. By the definition of zα , and since the distribution of Z is symmetric, we have X−µ √ ≤ zα/2 1 − α = P(−zα/2 ≤ Z ≤ zα/2 ) = P − zα/2 ≤ σ/ n √ √ = P − zα/2 σ/ n ≤ X − µ ≤ zα/2 σ/ n √ √ = P X − zα/2 σ/ n ≤ µ ≤ X + zα/2 σ/ n . Note that the assumption that we know the variance of the underlying distribution is idealistic. However, the assumption that the distribution of the random sample is (approximately) normal is very reasonable as we will see in the next lecture. 16 / 18 Confidence intervals for normal variables with known σ 2 Let us assume that the height of a student is distributed like a normal variable with standard deviation 7 cm. Let us construct a 95% confidence interval for the mean height using the data from the first example. Using a table for the standard normal, we find that z5/2 = z2.5 = 1.96 since P(−1.96 ≤ Z ≤ 1.96) = 0.95. Hence, using the formula from the slide before, the observed confidence interval with confidence level 95% is [l, r], where √ l = x̄ − z2.5 σ/ 10 = 179.9 − 1.96 · 7/3.162 = 175.56, and √ l = x̄ + z2.5 σ/ 10 = 179.9 − 1.96 · 7/3.162 = 184.23. It is very important to understand that it is not correct to say that with 95% probability the true average height of a student is in the interval [175.56,184.23]. It does not make sense to talk about probabilities since µ, 175.56, 184.23 are fixed numbers and not random variables. 17 / 18 Interpretation of confidence intervals • The confidence level tells us about the accuracy of the whole procedure of computing confidence intervals and not a single observed interval. • If we keep repeating to construct 100%(1 − α) confidence intervals with independently gathered data, the constructed intervals will contain the true parameter in 100%(1 − α) cases on average. • This implies that in α% cases on average, the constructed confidence interval will not contain the true parameter. 18 / 18

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download Lecture 7. Point estimation and confidence intervals