Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Unit D Properties of point estimators Introduction This Unit establishes some of the properties a good point estimator should have and evaluates which of these properties (if any) are guaranteed by the general methods introduced in Unit C, with particular attention to the ML estimators. Properties of point estimators 132 Equivariance Property • In Example B.2, suppose that the quantity of interest that we wish to infer is not , but the mean waiting time per call, denoted by = E(Xi) = 1/ . • To obtain the ML estimate of the true value of , we should write the likelihood function for , L( ) = n Y 1 i=1 e 1x iobs = n e 1 Pn x i=1 iobs , and maximize it through the standard procedure. It can be easily verified that the ML estimate is ˆobs = x̄obs. • Recall from Example B.2, that the ML estimate for ˆ obs = 1/x̄obs. Thus, ˆobs = 1/ ˆ obs, just like = 1/ . is • Note that ˆobs is derived from the same transformation as . • This is no coincidence, but results from a general property of ML estimation known as the equivariance property (many authors call it the invariance property). In general, the equivariance property ensures that if a parameter is transformed, the corresponding estimate and estimator are changed by the same transformation. 133 Unit D: More formally, a method of estimation is said to be equivariant if it is such that if T is the estimator for ✓, producing the estimate tobs, then, for any one-to-one transformation (✓), the corresponding estimator and estimate for (✓) are (T ) and (tobs), respectively. The MM and the ML satisfy the equivariance property: ˜ = (✓), ˜ and ˜obs = (✓˜obs), ˆ = (✓), ˆ ˆobs = (✓ˆobs). (Proof not given; refer to text books) • In Example B.2, the MM and the ML estimators of the true value of = E(Xi), by the equivariance property, are, respectively, ˜ = 1/ ˜ = X̄, ˆ = 1/ ˆ = X̄. (In both cases we estimate the mean waiting time with the corresponding sample mean.) • In Example B.3, suppose we are interested in estimating, not the population variance 2 of the IQ scores as before, but Properties of point estimators 134 p p 2. the population standard deviation = V ar(X ) = i p 2 is a one-to-one transformation of 2 . As > 0, = 2 2 We know that the ML estimate for 2 is ˆ obs = Sobs and 2 2 it can be shown that the MM estimate is also ˜ obs = Sobs (prove it as a homework). By the equivariance property of the two methods, we then have ˜ obs = q 2 = S , ˜ obs obs ˆ obs = q 2 = S . ˆ obs obs (In both cases, the population standard deviation is estimated with the sample standard deviation.) 135 Unit D: Principle of repeated sampling The fundamental issue is that we are dealing with estimates, that is with approximations of the true value, so what we would like to know is how good is the estimate? how large is the error of the approximation? We will answer these questions by recalling that an estimate is a realization of a random variable (or vector), the estimator, which has its own probability distribution called the sampling distribution. The properties of the estimator can then be studied by analyzing the features of its sampling distribution, such as the expected value, the variance and so on. This is the principle of repeated sampling by which we evaluate the properties of an estimator thinking of a hypothetical replication of the experiment, as if for many, many times we extracted a new sample of size n from the population and every time we computed a new estimate. We then evaluate the behaviour of all the estimates obtained, which are realizations of the same random variable (or vector), the estimator. In most situations, such replication of the experiment is not actually carried out, but we reason as if it took place. Properties of point estimators 136 Notation • As in the previous Units, we will denote the observed sample by xobs = (x1obs , . . . , xnobs ) and the generating random vector by X = (X1, . . . , Xn), whose j.d.f (or j.p.f.) belongs to the parametric family f (x; ✓), ✓ 2 ⇥. • For clarity, we will denote the true and unknown value of ✓ by ✓0, so that f (x; ✓0) is the true j.d.f (or j.p.f) of X. • We will also denote by T = T (X) a generic estimator for ✓, and the corresponding estimate by tobs = T (xobs). • In the following, we will assume that ✓ is unidimensional, but all the results presented below can be extended to the multi-parameter case. When appropriate, this extension will be specified. 137 Unit D: Mean Squared Error • Though ✓0 is unknown, ideally, we would like the probability distribution of T ✓0 to be concentrated around 0. In fact, this would ensure that for any sample extracted, and hence for any sample data xobs, the estimate tobs is close to the true value ✓0. • For example, in the very unrealistic case T = ✓0 with probability 1, we would have that for any xobs, tobs = ✓0. • A quantity which is used to measure the concentration of the distribution of T ✓0 around 0 is the Mean Squared Error (MSE), which is defined as MSE(T ) = E[(T ✓0)2], where the expected value is computed with respect to the true j.d.f (or j.p.f.) f (x; ✓0). • As (T ✓0)2 can be interpreted as the distance of the estimator from the true value of ✓, the MSE is the average distance from ✓0. • A small value of MSE implies that for any sample, with high probability, the estimate tobs is close to the true value ✓0. • The MSE of T can be decomposed as follows: MSE(T ) = E[(T Properties of point estimators ✓0)2] = V ar(T ) + [E(T ) ✓0 ] 2 138 Proof: E[(T ✓0)2] = E(T 2+✓02 2✓0T ) = E(T 2)+✓02 2✓0E(T ) = = E(T 2) [E(T )]2 + [E(T )]2 + ✓02 = V ar(T ) + [E(T ) 2✓0E(T ) = ✓0 ] 2 . • The term E(T ) ✓0 is called the bias of the estimator T . It measures the distance of the location of the distribution of T from ✓0. • Hence, the MSE of the estimator T can be written as MSE(T ) = [Bias(T )]2 + V ar(T ) which shows that the MSE has two components: 1. the bias of the estimator, which is related to the location of the distribution of T , 2. the variability of T . 139 Unit D: Bias • An estimator whose bias is 0 is said to be unbiased. In other words, T is an unbiased estimator of ✓0 if E(T ) = ✓0, where, as before, the expectation is computed with respect f (x; ✓0), i.e. the distribution of T is centred at ✓0. • By the repeated sampling principle, unbiasedness means that on average we estimate ✓0 correctly: if we repeat the sampling procedure many times, for each sample we compute tobs and then we average across all estimates, we obtain exactly ✓0. • If ✓ has more than one component the property of unbiasedness requires that for each element of ✓ the corresponding estimator is unbiased. Properties of point estimators 140 In general, neither the MM nor the ML guarantee the property of unbiasedness This means that the MM and the ML estimators might or might not be unbiased and the property, if required, has to be verified on a case by case basis. 1. In Example B.1, MM and ML lead to the same estimator of the true value of p, p0: p̃ = p̂ = X/n This is an unbiased estimator of p0, since E(X/n) = E(X)/n = np0/n = p0. 2. In Example B.2, MM and ML lead to the same estimator of the true value of , 0: ˜ = ˆ = 1/X̄. This is a biased estimator of E(X̄) = E ✓ X1 + . . . + Xn n = 141 0, ◆ since E(X1) + . . . + E(Xn) = = n n = 1/ 0, n 0 Unit D: but E(1/X̄) 6= 1/E(X̄) = 0. 3. In Example B.3, ✓ has two components: µ and 2. For each component, MM and ML produce the same estimator of the true value, µ0 and 02, respectively: ˜ 2 = ˆ 2 = S 2. µ̃ = µ̂ = X̄, To verify unbiasedness we have to assess whether each estimator is unbiased. • X̄ is an unbiased estimator of µ0. Proof: ✓ ◆ X1 + . . . Xn E(X1) + . . . + E(Xn) E(X̄) = E = = µ0 . n n • S 2 is a biased estimator of Proof: 1 n E(S 2) = E 1 n( n 2 0 2 0 n X Xi2 X̄ 2 i=1 + µ20) + 2 0. µ20 Properties of point estimators ! n 1X = E(Xi2) E(X̄ 2) = n i=1 V ar(X̄) + [E(X̄)]2 = ⇢ 2 0 n + µ20 = n 1 n 2 0. 142 It is easy to correct S 2 to derive an unbiased estimator of 2 0 . It follows from the previous steps that S 02 = S 2 n n is such that E(S 02) = 1 = 1 n 1 n X (Xi X̄)2 i=1 2 0. For any simple random sample, the sample mean X̄ and the “corrected” sample variance S 02 are unbiased estimators of the true population mean and variance, respectively. 143 Unit D: Expected Fisher Information For an unbiased estimator T of ✓0 the MSE simplifies into M SE(T ) = V ar(T ). So the MSE will be smaller the less variable is the estimator T . To study V ar(T ) we need to introduce a new quantity: the Fisher Information. • Recall that the score function is d d u(✓) = l(✓) = log f (xobs; ✓). d✓ d✓ So, u(✓) is also a function of xobs and as such it can be seen as a realization of a r.v. that we will denote by u(✓; X) to underline the dependence on the random vector X. • The Expected Fisher Information, denoted by I(✓), is (almost always) given by I(✓) = E ✓ ◆ d u(✓; X) . d✓ where the expectation is computed with respect to the generic j.d.f. (or j.p.f.) of X, f (x; ✓). Properties of point estimators 144 • To give an example of expected information, recall that in Example B.1, xobs u(p) = p n xobs , 1 p where xobs is a realization of X ⇠ Bin(n = 20, p), so that the corresponding r.v. is u(p; X) = X p n 1 X X np = . p p(1 p) The expected information is given by ✓ ◆ ✓ ◆ d X n X I(p) = E u(p; X) = E = 2 2 dp p (1 p) ⇥ ⇤ ✓ ◆ 2 2 E (2p 1)X np X + 2pX np = E = = {p(1 p)}2 {p(1 p)}2 np(1 p) n = = . 2 {p(1 p)} p(1 p) • Why is I(✓) called the expected information? The score function is the first derivative of the log-likelihood, so I(✓) is the expected curvature of the log-likelihood (the change of sign does not alter this interpretation). Greater values of I(✓) imply a greater curvature of the log-likelihood, hence greater information about the true value of ✓. • Note that well-behaved log-likelihoods are concave and the change of sign ensures a positive Fisher information. 145 Unit D: The Cramer-Rao inequality and efficiency The Cramer-Rao inequality states that if T = T (X) is an unbiased estimator of the true value of ✓, ✓0, then V ar(T ) 1/I(✓0), where V ar(T ) is evaluated with respect to the true j.d.f. (or j.p.f.) f (x; ✓0) of X. Proof: We do not prove this result, but for anyone interested the proof can be found, for example, in Azzalini. • The Cramer-Rao inequality gives a lower bound for the MSE of an unbiased estimator. • An unbiased estimator whose variance is equal to 1/I(✓0) is called an efficient estimator. • In Example B.1, we had X p̃ = p̂ = , n where X ⇠ Bin(n = 20, p), which are unbiased estimators of the true value of p, p0. We want to verify whether they are efficient estimators of p0, i.e. whether V ar(p̃) = V ar(p̂) = 1/I(p0). Properties of point estimators 146 We know from previous computations that I(p) = n p(1 so that I(p0) = p) , n p0(1 p0 ) . In addition, V ar(p̃) = V ar(p̂) = V ar(X) np0(1 p0) p0(1 p0) . = = 2 2 n n n Hence, V ar(p̃) = V ar(p̂) = 1/I(p0), which proves that the Cramer-Rao lower bound is reached and that p̃ and p̂ are efficient estimators of p0. 147 Unit D: Asymptotic properties • Except in very simple estimation problems finding “optimal” estimators, i.e. unbiased and efficient, is an impossible task, sometimes because they are not easy to identify, but mostly because such estimators do not exist. • Statistical theory overcomes this obstacle, by looking at the asymptotic properties of the estimators, that is at the behaviour of the estimators when the sample size becomes infinite. • This is not a merely theoretical exercise, as the underlying idea is that these asymptotic properties will be at least approximately satisfied in large samples. • In Example B.1, we have seen that M SE(p̃) = M SE(p̂) = V ar(p̃) = V ar(p̂) = p0(1 p0 ) n . So, as n ! 1, the M SE converges to 0, which implies that the distributions of p̂ and p̃ become more and more concentrated around the true value p0. • This is what we expect, not just from efficient estimators, but from any “reasonable” estimator. That is, we expect that as the sample size n increases (and the sample information becomes greater and greater), the estimator behaves better and better: its sampling distribution becomes Properties of point estimators 148 more concentrated around the true parameter value and the MSE converges to 0. • In the following pages we will review the asymptotic properties most commonly used to judge the estimator behaviour in large samples. 149 Unit D: Asymptotic unbiasedness and asymptotic efficiency Asymptotic unbiasedness An estimator T = T (X) = T (X1, X2, . . . , Xn) is said to be asymptotically unbiased if lim E(T ) = ✓0 n!1 where the expectation is computed with respect to the true j.d.f. (or j.p.f.) f (x; ✓0). Asymptotic efficiency Extending the Cramer-Rao inequality to infinite samples, we say that an asymptotically unbiased estimator T is asymptotically efficient if lim V ar(T )I(✓0) = 1, n!1 where the variance is computed with respect to the true j.d.f. (or j.p.f.) f (x; ✓0). Properties of point estimators 150 Consistency • The consistency property of an estimator requires that the estimator “converges” to the true parameter value ✓0 as the sample size n becomes infinite. It is such a fundamental property that inconsistent estimators are generally not even taken into account. • As we are dealing with estimators, i.e. with r.v.’s, we need a specific definition of limit which slightly di↵ers from the one used for determinist sequences, though the basic idea remains the same. • An estimator T = T (X) = T (X1, X2, . . . , Xn) is a consistent estimator of the true parameter value ✓0 if, for every ✏ > 0, lim P (|T (X1, . . . , Xn) n!1 ✓0| < ✏) = 1, where the probability is computed with respect to the true j.d.f (or j.p.f.) f (x; ✓0). • Roughly speaking, this means that as the sample size becomes larger and larger, the estimator will become arbitrarily close to the true parameter value with high probability. • A sufficient (but not necessary) condition for an estimator to be consistent is that 151 Unit D: 1. E(T ) ! ✓0, as n ! 1; (i.e. T is asymptotically unbiased) and 2. V ar(T ) ! 0, as n ! 1. • In Example B.1, we have E(p̃) = E(p̂) = p0, and V ar(p̃) = V ar(p̂) = p0(1 p0 ) n ! 0, as n ! 1. So, they are consistent estimators of p0. Properties of point estimators 152 Asymptotic behaviour of ML estimators • We now concentrate on the asymptotic properties of ML estimators which largely explain the dominant role that ML has in point estimation. • We will limit attention to simple random samples, but the results below can be extended to some non-i.i.d. cases. • If X = (X1, . . . , Xn) have i.i.d. components, under ˆ has the regularity conditions, the ML estimator for ✓, ✓, following asymptotic properties. 1. Property 1: ✓ˆ is a consistent estimator of the true parameter value ✓0. 2. Property 2: ✓ˆ is such that lim P n!1 (✓ˆ ✓0) p z 1 I (✓0) ! = (z), where (z) denotes the distribution function of N (0, 1). ˆ This means that the distribution function of p(✓ 1✓0) I (✓0 ) converges to that of a N (0, 1). Proof: We are not proving Properties 1 and 2. Anyone interested can find the proofs, for example, in Azzalini (pages 80 and 82). 153 Unit D: Asymptotic Normality of ML estimators and its implications • The practical implication of Property 2 is that, for n finite, but large, (✓ˆ ✓0) a p ⇠ N (0, 1), 1 I (✓0) or, equivalently, a ✓ˆ ⇠ N ✓0, I 1(✓0) , a where ⇠ stands for “approximately distributed as”. • From Property 2, we also see that ˆ ! ✓0 , E(✓) as n ! 1; (i.e. ✓ˆ is asymptotically unbiased) and ˆ V ar(✓)I(✓ 0) ! 1 (i.e. ✓ˆ is asymptotically efficient) • For all of the previous properties, the ML estimators are said to be best asymptotically normal. • For simple random samples, under regularity conditions, the MM estimators are Properties of point estimators 154 1. consistent, 2. asymptotically normal, 3. asymptotically unbiased, 4. but in general they are not asymptotically efficient. • This explains the preference for ML on MM. 155 Unit D: Examples of application of the asymptotic properties of ML estimators 1. Example B.2 • We know that, for n large enough, a ˆ⇠ N 1 0, I ( 0) . • We need to derive I( ). We have u( ) = n X n xiobs i=1 which is generated from the r.v. u( ; X) = n X n Xi . i=1 Then, I( ) = E ✓ d u( ; X) d • So, a ˆ⇠ N Properties of point estimators ✓ ◆ E = 0, 2 0 n ◆ ⇣ n⌘ 2 = n 2 . . 156 • The figure below shows the d.f. of ˆ for di↵erent values of n (n = 5, 15, 30, 100) with 0 = 1. 4 ^) f(λ 3 n=100 2 n=15 n=5 0 1 ^) f(λ n=30 0.0 0.5 1.0 1.5 2.0 2.5 3.0 ^ λ What we can see is that, as n increases, – the density of ˆ resembles more and more that of a Normal r.v.; – the density of ˆ becomes more and more concentrated around 0 as a consequence of the fact that V ar( ˆ ) ! 0. • Recall that in Example B.2 the number of observations is 150. From the figure above, we can deduce that the Normal distribution provides a good approximation of the true distribution of ˆ in this case study. In other words, n = 150 seems large enough to allow the use of asymptotic approximations. 2. Example B.1 • The ML estimator for the settings of Example B.1 is p̂ = 157 Unit D: X/n. We have already shown that this is an unbiased and efficient estimator of p0. If it is unbiased and efficient, it must also be asymptotically unbiased and efficient, but the latter properties come also from the general results on ML estimators, together with the consistency property. • The general results for ML estimators also ensure that, in large samples, the distribution of p̂ can be approximated by a p̂ ⇠ N p0, I 1(p0) , that is a p̂ ⇠ N ✓ p0 , p0(1 p0 ) n ◆ . • The figure below shows the p.f. of p̂ for di↵erent values of n (n = 5, 20, 50, 100) and p0 = 0.2. n=20 0.20 0.00 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 ^ p ^ p n=50 n=100 1.0 0.8 1.0 0.00 ^) f(p 0.8 0.06 0.2 0.00 0.06 0.12 0.0 ^) f(p 0.10 ^) f(p 0.2 0.0 ^) f(p 0.4 n=5 0.0 0.2 0.4 0.6 ^ p Properties of point estimators 0.8 1.0 0.0 0.2 0.4 0.6 ^ p 158 • Note that, as X is discrete, p̂ is also discrete, but as n increases, p̂ resembles more and more a continuous r.v. (the Normal r.v.). • Recall that in Example B.1 we have n = 20. From the figure above, for n = 20 the distribution of p̂ shows some departure from Gaussianity, so there is some doubt on whether the Normal distribution is giving a good approximation of the distribution of p̂ for this case study. In other words, n = 20 might not be large enough to allow the use of asymptotic approximations. 159 Unit D: General remarks • The issue that could be raised is how large n has to be for the asymptotic Normal distribution to be a good approximation ˆ There is unfortunately no of the true distribution of ✓. unique answer, as it depends on the shape of the distribution of each Xi and on the true parameter value ✓0 which is, of course, unknown. • In both the examples above (and others) MM and ML lead to the same estimator. This is generally not the case, especially in more complicated estimation problems. • In Example C.1, where f1(x; ✓) > 0 for x > ✓, i.e. the Xi’s take value on a region that depends on ✓, the regularity conditions required by the asymptotic results on ML estimators do not hold. In this context we cannot rely on any of the asymptotic properties seen above and the behaviour of the estimator ✓ˆ = min(X1, . . . , Xn) has to be studied directly. • Examples B.3 and B.4 fall outside the scope of this Unit, being multi-parameter problems. However, asymptotic properties of ML estimators similar to those of the single parameter case hold for higher dimensional problems with some adjustments. Properties of point estimators 160 Observed Information and Standard Errors • From a ✓ˆ ⇠ N ✓0, I 1(✓0) , we derive that for large samples a ˆ V ar(✓) = I 1(✓0), a where = stands for “approximately equal to”. This implies that if n is large we do not need to compute the variance of ✓ˆ directly, but we can more simply use I 1(✓0). • The problem with the use of I 1(✓0) as a measure of variability of the ML estimator is that I 1(✓0) is unknown, since ✓0 is unknown. • One solution is to replace the expected information evaluated at ✓0 with the so-called observed information defined as I(✓ˆobs) = d u(✓) d✓ . ✓=✓ˆobs • This solution “works” because it can be shown that I(✓ˆobs) is a consistent estimate of I(✓0) (not proven). • We then estimate the variance of ✓ˆ in large samples by I 1(✓ˆobs). 161 Unit D: ˆ • The qcorresponding estimate of the standard deviation of ✓ is I 1(✓ˆobs). This is called the standard error of ✓ˆ and ˆ denoted by s.e.(✓). Properties of point estimators 162 Examples of standard errors 1. In Example B.2, a measure of the variability of ˆ can be computed as follows. • We know that u( ) = n n X xiobs , i=1 so that d n u( ) = , 2 d from which we can compute the observed information n ˆ I( obs) = . ˆ2 obs • Recalling that ˆ obs = 1/0.68 and n = 150, s.e.( ˆ ) = p 1/(0.682 ⇥ 150) = 0.014 2. In Example B.1, a measure of the variability of p̂ can be computed as follows. • We know that u(p) = 163 xobs p n xobs , 1 p Unit D: so that d xobs n xobs u(p) = , dp p2 (1 p)2 from which we can compute the observed information I(p̂obs) = xobs n + p̂2obs (1 xobs . 2 p̂obs) • Recalling that xobs = 4, n = 20 and p̂obs = 4/20 = 0.2, we have s.e.(p̂) = s 4 20 4 + 0.22 (1 0.2)2 1 = 0.089 • There might be some doubt about the quality of 0.089 as a measure of variability of p̂, since, as noticed before, n = 20 might not be large enough to ensure good approximations. Properties of point estimators 164 Approximate confidence intervals based on the asymptotic distribution of ML estimators • Though interval estimation is outside the scope of this course, the asymptotic properties of ML estimators have an important implication for building confidence intervals that is worth mentioning. • Recall that a confidence interval of level 1 ↵ (for ↵ fixed) for an unknown parameter ✓ is given by the interval (T1, T2) where T1 and T2 are two statistics such that T1 = T1(X1, . . . , Xn), T2 = T2(X1, . . . , Xn) and P(T1 ✓0 T2) = 1 ↵, with ✓0 being, as before, the true and unknown value of ✓. • If we let T1 = ✓ˆ and p z1 ↵/2 I 1(✓0) p T2 = ✓ˆ + z1 ↵/2 I 1(✓0) where z1 ↵/2 is the 1 ↵/2 quantile of N (0, 1), the interval (T1, T2) is a confidence interval of approximate level 1 ↵ for large samples. • This follows from 165 a ✓ˆ ⇠ N ✓0, I 1(✓0) , Unit D: which is equivalent to ✓ˆ ✓0 a p ⇠ N (0, 1), 1 I (✓0) so that, for large samples, P z1 ↵/2 Thus, . ✓ˆ ✓0 p z1 1 I (✓0) a ↵/2 P (T1 ✓0 T2) = 1 ! a = 1 ↵. ↵ • (T1, T2) as defined above is a random interval, meaning that T1 and T2 are r.v.’s. The corresponding observed interval that is built on a specific sample xobs will be (t1obs , t2obs ) where p ˆ t1obs = ✓obs z1 ↵/2 I 1(✓0) and t2obs = ✓ˆobs + z1 p 1 ↵/2 I (✓0 ). • The problem with the confidence interval specified above is that t1obs and t2obs depend on I(✓0) which is unknown. ˆ the • Just like when we derived a measure of variability for ✓, solution is to replace I(✓0) with the observed information Properties of point estimators 166 I(✓ˆobs). Under this choice, the numeric confidence interval of approximate level 1 ↵ becomes ⇣ 167 ✓ˆobs z1 ↵/2 ˆ ✓ˆobs + z1 ⇥ s.e.(✓); ↵/2 ⌘ ˆ . ⇥ s.e.(✓) Unit D: Examples of construction of approximate confidence intervals 1. Example B.2 • Suppose we are interested in building a 95% confidence interval for . • We have seen that s.e.( ˆ ) = 0.014. • The confidence interval is then (1/0.68 1.96⇥0.014; 1/0.68+1.96⇥0.014) = (1.44; 1.50), where 1.96 corresponds to z0.975. • As n = 150 seemed to ensure a good Normal approximation of the true distribution of ˆ , the actual confidence level of this interval will be very close to the nominal 0.95 level. 2. Example B.1 • Suppose we are interested in building a 90% confidence interval for p. • We have seen that s.e.(p̂) = 0.089. • The confidence interval is then (0.2 1.645⇥0.089, 0.2+1.645⇥0.089) = (0.054; 0.346) where 1.645 corresponds to z0.95. Properties of point estimators 168 • As n = 20 might not be large enough for a good Normal approximation of the true distribution of p̂, the actual confidence level of this interval might be not so very close to the nominal 0.9 level. 169 Unit D: Exercises Exercise D.1 Use the equivariance property of ML to answer the following questions. 1. From the answer to question 3 of Exercise C.3 construct the ML estimator of the probability that no one uses the cash point in a single day. 2. From the answer to question 4 of Exercise C.3 compute the ML estimate of the probability that no one uses the cash point in a single day. Exercise D.2 Use the answer to part 3 of Exercise C.3 to solve the following problems. 1. Show that ˆ is an unbiased estimator of the true value of . 2. Compute the MSE of ˆ . 3. Show that ˆ is an efficient estimator of the true value of . 4. Write the approximate distribution of ˆ in large samples. Exercise D.3 Use the answer to part 4 of Exercise C.3 and the data specified in part 3 of Exercise B.2 to solve the following problems. 1. Compute the standard error of ˆ . 2. Build an approximate 95% confidence interval for the true value of . Exercise D.4 Use the answer to part 3 of Exercise C.2 to solve the following problems. Let ✓ denote the mean number of trials till failure, i.e. ✓ = E(Xi) = 1/p. 1. Using the equivariance property, derive the ML estimator for ✓ from the ML estimator of p. Properties of point estimators 170 2. Show that the ML estimator for ✓ is unbiased. 3. Compute the MSE of the ML estimator for ✓. 4. Show that the ML estimator for ✓ is efficient. Exercise D.5 Use the answer to part 4 of Exercise C.2 and the data specified in part 3 of Exercise B.1 to solve the following problems. 1. Compute the standard error of the ML estimator for p. 2. Build an approximate 90% confidence interval for the true value of p. Exercise D.6 Using the solution of part 3 of Exercise C.4 and fixing 2 = 60, 1. show that the ML estimator of the true value of µ is unbiased; 2. compute the MSE of the ML estimator of the true value of µ; 3. show that the ML estimator of the true value of µ is efficient. Using the solution of part 4 of Exercise C.4, 4. compute the standard error of the ML estimator for µ; 5. construct an approximate 95% confidence interval for the true value of µ. 171 Unit D: Properties of point estimators 172