Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
SOLUTION FOR HOMEWORK 3, STAT 4352 Welcome to your third homework. We finish the point estimation; your Exam 1 is next week and it will be close to HW1-HW3. Recall that X n := (X1 , . . . , Xn ) denotes the vector of n observations. Try to find mistakes (and get extra points) in my solutions. Typically they are silly arithmetic mistakes (not methodological ones). They allow me to check that you did your HW on your own. Please do not e-mail me about your findings — just mention them on the first page of your solution and count extra points. Now let us look at your problems. 1. Problem 10.51. Let X1 , . . . , Xn be iid according to Expon(θ), so fθX (x) = (1/θ)e−x/θ I(x > 0), θ ∈ Ω := (0, ∞). Please note that it is important to write this density with indicator function showing its support. In some cases the support may depend on a parameter of interest, and then this fact is always very important. We shall see such an example in this homework. For the exponential distribution we know that Eθ (X) = θ (you may check this by a direct calculation), so we get a simple method of moments estimator Θ̂M M E = X̄. This is the answer. But I would like to continue a bit. The method of moments estimator (or a generalized one) allows you to work with any moment (or any function). Let us consider the second moment and equate sample second moment to the theoretical one. Recall that V arθ (X) = θ2 , and thus Eθ (X 2 ) = V arθ (X) + (Eθ (X))2 = 2θ2 . The sample second moment is n−1 Pn i=1 Xi2 , and we get another method of moments estimator Θ̃M M E = [n−1 n X Xi2 /2]1/2 . i=1 Note that these MM estimators are different, and this is OK. Then a statistician should choose a better one. Which one do you think is better? You may use the notion of efficiency to resolve the issue (compare their MSEs (mean squared errors) E(θ̂ − θ)2 and choose an estimator with the smaller MSE). By the way, which estimator is based on the sufficient statistic? 2. Problem 10.53. Here X1 , . . . , Xn are P oisson(λ). Recall that Eλ (X) = λ and V arλ (X) = λ. The MME is easy to get via the first moment, and we have λ̂M M E = X̄. 1 This is the answer. But again, as an extra example, I can suggest a MME based on the second moment. Indeed, Eλ (X 2 ) = V arλ (X) + (Eλ X)2 = λ + λ2 and this yields that λ̃M M E + λ̃2M M E = n−1 n X Xi2 . i=1 Then you need to solve this equation to get the MME. Obviously it is a more complicated estimator, but it is yet another MME. 3. Problem 10.56. Let X1 , . . . , Xn be iid according to the pdf gθ (x) = θ−1 e−(x−δ)/θ I(x > δ). Please note that this is a location-exponential family because X = δ + Z, where Z is a classical exponential RV with fθZ (z) = θ−1 e−z/θ I(z > 0). I can go either further by saying that we are dealing with a location-scale family because X = δ + θZ0 , where f Z0 (z) = e−z I(z > 0). So now we know the meaning of parameters δ and θ: the former is the location (shift) and the latter is the scale (multiplier). Note that this understanding simplifies all calculations because you can easily figure out (otherwise do calculations) that V arδ,θ (X) = θ2 . Eδ,θ (X) = δ + θ, These two familiar results yield Eδ,θ (X 2 ) = θ2 + (δ + θ)2 , and we get the following system of two equations to find the pair of MMEs: δ̂ + θ̂ = X̄, 2θ̂2 + 2δ̂ θ̂ + δ̂ 2 = n−1 n X Xi . i=1 To solve this system, we square the both sides of the first equality and then subtract the obtained equality from the second equality. We get a new system δ̂ + θ̂ = X̄, θ̂2 = n−1 n X Xi2 − X̄ 2 . i=1 This, together with a simple algebra, yields the answer δ̂M M E = X̄ − [n−1 n X Xi2 − X̄ 2 ]1/2 , i=1 θ̂M M E = [n−1 n X i=1 2 Xi2 − X̄ 2 ]1/2 . Remark: We need to check that n−1 ni=1 Xi2 − X̄ 2 ≥ 0 for the estimator to be well defined. This may be done via famous Hölder inequality P ( m X aj )2 ≤ m m X a2j . j=1 j=1 4. Problem 10.59. Here X1 , . . . , Xn are P oisson(λ), λ ∈ Ω = (0, ∞). Recall that Eλ (X) = λ and V arλ (X) = λ. Then, by definition of the MLE: λ̂M LE := arg max λ∈Ω = arg max λ∈Ω n X n Y fλ (Xl ) =: arg max LX n (λ) λ∈Ω l=1 ln(fλ (Xl )) =: arg max ln LX n (λ). λ∈Ω l=1 For the Poisson pdf fλ (x) = e−λ λx /x! we get ln LX n (λ) = −nλ + n X Xl ln(λ) − l=1 n X ln(Xl !). l=1 Now we need to find λ̂M LE at which the above loglikelihood attains its maximum over all λ ∈ Ω. You can do this in a usual way: take derivative with respect to λ ( that is, calculate ∂ ln LX n (λ)/∂λ, then equate it to zero, solve with respect to λ, and then check that the solution indeed maximizes the loglikelihood). Here equating of the derivative to zero yields P −n + nl=1 Xl /λ = 0, and we get λ̂M LE = X̄. Note that for the Poisson setting the MME and MLE coincide; in general they may be different. 5. Problem 10.62. Here X1 , . . . , Xn are iid N(µ, σ 2 ) with the mean µ being known and the parameter of interest being the variance σ 2 . Note that σ 2 ∈ Ω = (0, ∞). Then we are interested in the MLE. Write: 2 σ̂M ln LX n (σ 2 ). LE = arg max 2 σ ∈Ω Here ln L Xx 2 (σ ) = n X 2 −1/2 −(Xl −µ)2 /(2σ2 ) ln([2πσ ] e 2 2 ) = −(n/2) ln(2πσ ) − (1/2σ ) l=1 n X l=1 This expression takes on its maximum at 2 −1 σ̂M LE = n n X l=1 Note that this is also the MME. 3 (Xl − µ)2 . (Xl − µ)2 . 6. Problem 10.66. Let X1 , . . . , Xn be iid according to the pdf gθ (x) = θ−1 e−(x−δ)/θ I(x > δ). Then LX n (δ, θ) = θ−n e− Pn l=1 (Xl −δ)/θ I(X(1) > δ). Recall that X(1) = min(X1 , . . . , Xn ) is the minimal observation [the first ordered observation]. This is the case that I wrote you about earlier: it is absolutely crucial to take into account the indicator function (the support) because here the parameter δ defines the support. By its definition, (δ̂M LE , θ̂M LE ) := arg max δ∈(−∞,∞),θ∈(0,∞) ln(LX n (δ, θ)). Note that L(δ, θ) := ln(L Xn (δ, θ)) = −n ln(θ) − θ −1 n X (Xl − δ) + ln I(X(1) ≥ δ). l=1 Now the crucial step: you should graph the loglikelihood L as a function in δ and visualize that it takes on maximum when δ = X(1) . So we get δ̂M LE = X(1) . Then by taking a P derivative we get that θ̂M LE = n−1 nl=1 (Xl − X(1) ). P Answer: (δ̂M LE θ̂M LE ) = (X(1) , n−1 nl=1 (Xl − X(1) ). Please note that δ̂M LE is a biased estimator; this is a rather typical outcome. 7. Problem 10.73. Consider iid uniform observations X1 , . . . , Xn with the parametric pdf fθ (x) = I(θ − 1/2 < x < θ + 1/2). As soon as the parameter is in the indicator function you should be very cautious: typically a graphic will help you to find a MLE estimator, and not a differentiation. Also, it is very helpful to figure out the nature of the parameter. Here it is obviously a location parameter, and you can write X = θ + Z, Z ∼ Unif orm(−1/2, 1/2). The latter helps you to guess about a correct estimator and check a suggested one and, if necessary, simplify calculations of descriptive characteristics (mean, variance, etc.) Well, now we need to write down the likelihood function (recall that this is just a joint density only considered as a function in the parameter given a vector of observations): LX n (θ) = n Y I(θ − 1/2 < Xl < θ + 1/2) = I(θ − 1/2 < X(1) ≤ X(n) < θ + 1/2). l=1 Note that the latter expression implies that (X(1) , X(n) ) is a sufficient statistic (due to the Factorization Theorem). As a result, any good estimator, and the MLE in particular, must be a function of only these two statistics. Another remark is: it is possible to show (there exists a technique how to do this which is beyond this class objectives) that this pair of 4 extreme observations is also the minimal sufficient statistic. Please look at the situation: we have 1 parameter and need 2 univariate statistics (X(1) , X(n) ) to have the sufficient statistics; here this is the limit of data-reduction. Nonetheless, this is a huge data-reduction whenever n is large. Just think about this: to estimate θ you do not need any observation which is between the two extreme ones! This is not a trivial assertion. Well, now let us return to the problem at hand. If you look at the graphic of the likelihood function as a function in θ, then you may conclude that it attains its maximum on all θ such that X(n) − 1/2 < θ < X(1) + 1/2. (1) As a result, we get a very curious MLE: any point within this interval can be declared as the MLE (the MLE is not unique!). Now we can consider the particular questions at hand. (a). Let Θ̂1 = (1/2)(X(1) + X(n) ). We need to check that this estimator satisfies (1). We just plug-in this estimator in (1) and get X(n) − 1/2 < (1/2)(X(1) + X(n) ) < X(1) + 1/2. The latter relation is true because it is equivalent to the following valid inequality: X(n) − X(1) < 1. (b) Let Θ̂2 = (1/3)(X(1) + 2X(n) ) be another candidate for the MLE. Then it should satisfy (1). In particular, if this is the MLE then (1/3)(X(1) + 2X(n) ) < X(1) + 1/2 should hold. The latter inequality is equivalent to X(n) − X(1) < 3/4 which obviously may not hold. The contradictory shows that this estimator, despite being a function of the sufficient statistic, is not the MLE. 8. Problem 10.74. Here we are exploring the Bayesian approach where the parameter of interest is considered as a realization of a random variable, (it can be considered as a random variable). For the problem at hand X ∼ Binom(n, θ) and θ is a realization (which we do not directly observe) of a beta RV Θ ∼ Beta(α, β). [Please note that here your knowledge of basic/classical distributions becomes absolutely crucial: you cannot solve any problem without knowing formulae for pmf/pdf; so it is time to refresh them.] In other words, here we are observing a binomial random variable whose parameter (probability of success has a beta prior. To find a Bayesian estimator, we need to find a posterior distribution of the parameter of interest and then calculate its mean. [Please note that your knowledge of means of classical distribution becomes very handy here: as soon as you realize the underlying posterior distribution, you can use a formula for calculating its mean.] 5 Given this information, the posterior distribution of Θ given the observation X is f Θ|X (θ|x) = = f Θ (θ)f X|Θ (x|θ) f X (x) Γ(n + α + β) θx+α−1 (1 − θ)(n−x+β)−1 . Γ(x + α)Γ(n − x + β) The algebra leading to the last equality is explained on page 345. Now you can realize that the posterior distribution is again Beta(x + α, n − x + β). There are two consequences from this fact. First, by a definition, if a prior density and a posterior density are from the same family of distributions, then the prior is called conjugate. This is the case that Bayesian statisticians like a lot because this methodologically support the Bayesian approach and also simplifies formulae. Second, we know a formula for the mean of a beta RV, and using it we get the Bayesian estimator Θ̂B = E(Θ|X) = X +α X +α = (α + X) + (n − X + β) α+n+β Now we actually can consider the exercise at hand. A general remark: Bayesian estimator is typically a linear combination of the prior mean and the MLE estimator with weights depending on variances of these two estimates. In general, as n → ∞, Bayesian estimator approaches the MLE. Let us check that this is the case for the problem at hand. Write, Θ̂B = α n α+β X + . n α+β+n α+βα+β+n Now, if we denote w := n , α+β+n we get the wished presentation Θ̂ = w X̄ + (1 − w)θ0 . where θ0 = E(Θ) = α/(α + β) is the prior mean of Θ. Now, the problem at hand asks us to work a bit further on the weight. The variance of the beta RV Θ is αβ V ar(Θ) := σ02 = . 2 (α + β) (α + β + 1) Well, it is plain to see that θ0 (1 − θ0 ) = αβ . (α + β)2 Then a simple algebra yields σ02 = θ0 (1 − θ0 ) α+β+1 6 which in its turn yields α+β = θ0 (1 − θ0 ) − 1. σ02 Using this we get the wished w= n . n + θ0 (1 − θ − 0)σ0−2 − 1 Problem is solved. 9. Problem 10.76. Here X ∼ N(µ, σ 2 ) with σ 2 being known. A sample of size n is given. The parameter of interest is the population mean µ, and a Bayesian approach is considered with the Normal prior M ∼ N(µ0 , σ02 . In other words, the Bayesian approach suggests to think about an estimated µ as a realization of a random variable M which has a normal distribution with the given mean and variance. As a result, we know that the Bayesian estimator is the mean of the posterior distribution. The posterior distribution is calculated in Th.10.6, and it is again normal N(µ1 , σ12 ) where σ2 nσ02 + µ0 2 ; µ1 = X̄ 2 nσ0 + σ 2 nσ0 + σ 2 1 n 1 = 2 + 2. 2 σ1 σ σ0 Note that this theorem implies that the normal distribution is the conjugate prior: the prior is normal and the posterior is normal as well. We can conclude that the Bayesian estimator is M̂B = E(M|X̄) = w X̄ + (1 − w)µ0 , that is, the Bayesian estimator is a linear combination of the MLE estimator (here X̄) and the prior mean (pure Bayesian estimator when no observations are available). Recall that this is a rather typical outcome, and the Bayesian estimator approaches the MLE as n → ∞. A direct (simple) calculation shows that w = n/[n + σ 2 /σ02 ]. Problem is solved. 10. Problem 10.77. Here a Poisson RV X with an unknown intensity λ is observed. The problem is to estimate λ. A Bayesian approach is suggested with the prior distribution for the intensity Λ being Gamma(α, β). In other words, X ∼ P oiss(Λ) and Λ ∼ Gamma(α, β). To find a Bayesian estimator, we need to evaluate the posterior distribution of Λ given X and then calculate its mean; that mean will be the Bayesian estimator. We do this in two steps. (a) To find the posterior distribution we begin with the joint pdf f Λ,X (λ, x) = f Λ (λ)f X|Λ (x|λ) = 1 λα−1 e−λ/β e−λ λx [x!]−1 I(λ > 0)I(x ∈ {0, 1, . . .}). Γ(α)β α 7 Then the posterior pdf is f Λ|X (λ|x) = λ(α+x)−1 e−λ(1+1/β) f Λ,X (λ, x) = I(λ > 0). f X (x) Γ(α)β αf X (x)x! (2) Now I explain you what smart Bayesian statisticians do. They do not calculate f X (x) or try to simplify (2); instead they look at (1) as a density in λ and try to guess what family it is from. Here it is plain to realize that the posterior pdf is again Gamma, more exactly it is Gamma(α + x, β/(1 + β)). Note that the Gamma prior for the Poisson intensity parameter is the conjugate prior because the posterior is from the same family. As soon as you realized the posterior distribution, you know what the Bayesian estimator is: it is the expected value of this Gamma RV, namely Λ̂B = E(Λ|X) = (α + X)[β/(1 + β)] = β(α + X)/(1 + β). The problem is solved. 11. Problem 10.94. This is a curious problem on application and analysis of Bayesian approach. It is given that the observation X is a binomial RV Binom(n = 30, θ) and someone believes that the probability of success θ is a realization of a Beta random variable Θ ∼ Beta(α, β). Parameters α and β are not given; instead it is given that EΘ = θ0 = .74 and V ar(Θ) = σ02 = 32 = 9. [Do you think that this information is enough to find the parameters of the underlying beta distribution? If “yes”, then what are they?] Now we are in a position to answer the questions. (a). Using only the prior information (that is, no observation is available), the best MSE estimate is the prior mean Θ̂prior = EΘ = .74. (b) Based on the direct information, the MLE and the MME estimators are the same and they are Θ̂M LE = Θ̂M M E = X̄ = X/n = 18/30. [Please compare answers in (a) and (b) parts. Are they far enough?] (c) The Bayesian estimator with Θ ∼ Beta(α, β) is (see p.345) Θ̂B = X +α . α+β+n Now, we can either find α and β from the mean and variance information, or use results of our homework problem 10.74 and get Θ̂B = w X̄ + (1 − w)E(Θ), where w= n n+ θ0 (1−θ0 ) σ02 −1 8 = 30 30 + (.74)(.26) 9 −1 . 12. Problem 10.96. Let X be a grade, and assume that X ∼ N(µ, σ 2 ) with σ 2 = (7.4)2 . Then there is a professor’s believe, based on a prior knowledge, that the mean M ∼ N(µ0 = 65.2, σ02 = (1.5)2 ). After exam X̄ = 72.9 is the observation. (a) Denote by Z the standard normal random variable. Then using z-scoring yields P (63.0 < M < 68.0) = P 63.0 − µ σ0 0 < M − µ0 68.0 − µ0 < σ0 σ0 2.2 68 − 65.2 2.8 63 − 65.2 =P − . <Z< <Z< 1.5 1.5 1.5 1.5 Then you use Table — I skip this step here. (b) As we know from Theorem 10.6, M|X̄ is normally distributed with = P( µ1 = nX̄σ02 + µ0 σ 2 , nσ02 + σ 2 σ12 = σ 2 σ02 . σ 2 + nσ02 Here: n = 40, X̄ = 72.9, σ02 = (1.5)2 , σ 2 = (7.4)2 , µ0 = 65.2. Plug-in these numbers and then 63 − µ 68 − µ1 1 . P (63 < M < 68|X̄ = 72.9) = P <Z< σ1 σ1 Find the numbers and use the Table. 9