* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download SOLUTION FOR HOMEWORK 8, STAT 4372 Welcome to your 8th
Genetic algorithm wikipedia , lookup
Knapsack problem wikipedia , lookup
Routhian mechanics wikipedia , lookup
Birthday problem wikipedia , lookup
Computational fluid dynamics wikipedia , lookup
Pattern recognition wikipedia , lookup
Corecursion wikipedia , lookup
Perturbation theory wikipedia , lookup
Least squares wikipedia , lookup
Simulated annealing wikipedia , lookup
Computational electromagnetics wikipedia , lookup
Inverse problem wikipedia , lookup
Generalized linear model wikipedia , lookup
Multiple-criteria decision analysis wikipedia , lookup
SOLUTION FOR HOMEWORK 8, STAT 4372 Welcome to your 8th homework. Here you have an opportunity to solve classical estimation problems which are the must to solve on the exam due to their simplicity. 1. Problem 15.4 Given: X̄ = 35, 000, Sn = 75, 000, π̂50 = 10, 000, π̂90 = 100, 000. Using percentile matching for Weibull distribution, finds its two parameters. Solution: I do not know why moments are also given — you can use them for the method of moments in any case. Now an important remark. For Weibull, based on the current Table, you can use matching either via percentiles - using V aRg (X|θ, τ ) - or cdf F (x|θ, τ ). Below I present both approaches, on your exam choose one which is faster. (a) Matching percentiles π̂g = πg (θ, τ ) = V aRg (X|θ, τ ) where here g = .5 and g = .9. From the Table V aRg (X|θ, τ ) = θ[− ln(1 − g)]1/τ . Now I solve the system of to equations always going from system to system (to avoid a stupid mistake). Write ( θ[− ln(.5)]1/τ = 10, 000 θ[− ln(.1)]1/τ = 100, 000 Now I begin to solve it dividing ( ( [ln(.1)/ ln(.5)]1/τ = 10 θ[− ln(.5)]1/τ = 100, 000 1/τ = ln(1)/ ln[ln(.1)/ ln(.5)] θ = 10, 000[− ln(.5)]−1/τ ( 1/τ = 1.918 θ = 20197.5 Answer: τ̂ = .52 and θ̂ = 20, 197.5 τ (b) Matching via known cdf F (x|θ, τ ) = 1 − e−(x/θ) . I need here two equation for F (π̂g |θ, τ ) = g with g = .5, .9. Let us solve the system of equations ( 1 − exp(−(10, 000/θ)τ ) = .5 1 − exp(−(100, 000/θ)τ ) = .9 Put exponents to the right, numbers to the left, take logarithms and get a system similar to the above-considered ( (10, 000/θ)τ ) = ln(.5) (100, 000/θ)τ ) = ln(.1) Solve it and get the same answer. 2. Problem 15.6 Solution: First of all, you need to recalculate claims to the level of year 3 by taking into account the inflation. 100 claims from the first year will give you (100)(10, 000)(1.1)2 = 1, 210, 000 1 and the 200 claims from the second year will give you (200)(12, 500)(1.1) = 2, 750, 000. To use the method of moments I calculate the average of n = 300 claims as X̄ = [1, 210, 000 + 2, 750, 000]/300 = 13, 200. For the Pareto distribution µ = θ/(α − 1) and because α = 3 is given we get µ = θ/2. From µ = X̄ we get answer θ̂ = 26, 400. 3. Problem 15.8 This is a nice problem. Given: X := ZX + (1 − Z)B where Z is Bernoulli(p) and independent of two exponentially distributed A and B with means 1 and 10, respectively. [Note that there is no information that A and B are independent.] Given Varp (X) = 22 , find p using method of moments. Solution: Here empirical variance is given, so we need to calculate the theoretical variance as a function in p and then solve the equation. To find variance we calculate first and second moments. Write Ep (X) = Ep (Z)E(A) + Ep (1 − Z)E(B) = p(1) + (1 − p)(10) = 10 − 9p. For the second moment using the Table we get E(A2 ) = 2[E(A)]2 , and using this formula and Ep (Z 2 ) = p we can write, Ep (X 2 ) = Ep (Z 2 A2 ) + 2Ep {Z(1 − Z)AB} + Ep ((1 − Z)2 B 2 ) = pE(A2 ) + 0 + (1 − p)E(B 2 ) = 2p + (1 − p)200 = 200 − 198p. In the above I used Z(1 − Z) = 0 which holds because Z is either 0 or 1. Now we can calculate Varp (X) = Ep (X 2 ) − [Ep (X)]2 = 200 − 198p − (10 − 9p)2 = 100 − 18p − 81p2 . Solve the equation Varp (X) = 4 and get p̂ = .98. 4. Problem 15.21 Given: Losses are Weibull(θ, τ ). A sample of 16 losses is given (I skip it). Use the smoothed empirical estimate of 20th and 70th percentiles to find underlying parameters θ, τ . Solution: First, n + 1 = 17. Then for the 20th percentile we have (17)(.2) = 3.4 which yields j.2 = 3 and h.2 = .4 and X(3) = 75. Then using Definition 15.3 on page 377 yields p̂.2 = (.6)X(3) + (.4)X(4) = .6(75) + .4(81) = 77.4. Similarly, for 70th percentile we get .7(17) = 11.9 which yields j.7 = 11, h.7 = .9 and X(11) = 122. Then p̂.7 = (.1)X(11) + (.9)X(12) = .1(122) + .9(125) = 124.7. 2 Here we can use cdf and equate FX (π̂g ) = g. We get ( 1 − exp(−(77.4/θ)τ ) = .2 1 − exp(−(124.7/θ)τ ) = .7 Solve the system and get τ̂ = 3.53 and θ = 118.32. 5. Problem 15.22 Given: X has pdf fX (x) = θ−1 exp(−(x − δ)/θ)Y (x > δ. Given X̄ = 300 and π̂.5 (X) = 240. Find δ and θ. Solution: It is important to note (with the purpose of a faster solution) that δ is a shift parameter, that is X = Z + δ where Z is a standard exponential RV with mean θ. As a result of this remark, Z̄ = 300 − δ, π̂.5 (Z) = 240 − δ, and we can use the Table for theoretical characteristics Eθ (Z) = θ, π.5 (Z) = VaR.5 (Z) = −θ ln(.5) Remark: If the RV is not from the Table then you need to do your calculations of the characteristics using a given pdf, that is, calculate mean andR median directly using their Rm definitions. For instance, here the solution of 0 fZ (z)dz = m∞ fZ (z)dz will give you the median m. Then we solve the system ( θ = 300 − δ −θ ln(.5) = 240 − δ ( 60 = θ[1 + ln(.5) δ = 300 − θ Answer: θ = 195.5 and δ = 104.6. 6. Problem 15.29 Solution: First of all, I need to explain what q35 is. This is a discrete analog of the hazard rate function, q35 = S(35) − S(36) . S(35) In other words, it is a likelihood to die during 36th year if a live was observed at age 35 (see a discussion and definition at page 368; this parameter is used for life tables and in life insurance - Exam 3). Because here we are dealing with conditional expectation, we can restrict attention only to lives that were observed at age 35, and then consider a new distribution such that F (35) = 0 and G(35) = 1. Then q35 = F (36) = 1 − S(36). This is the idea of shifting the data discussed on page 385. Now let us look at the data and factors in the likelihood. We have 4 groups of data and corresponding likelihoods. (1) 6 lives observed at age 35.4 and died before 36. Conditional probability of an event for each live is (I use the technique discussed at p.385) F (1) − F (.4) w − .4w .6w = = . 1 − F (.4) 1 − .4w 1 − .4w 3 The corresponding factor in likelihood is L1 = [ .6w 6 ]. 1 − .4w (2) 4 lives observed at age 35.4 and survived at age 36. The conditional probability of the event for each live is 1−w F (∞) − F (36) = . F (∞) − F (35.4) 1 − .4w with the corresponding factor in likelihood L2 = [ 1−w 4 ]. 1 − .4w (3) 8 lives observed at age 35 (the initial point) and died before 36. The conditional probability of the event for a live is w with the likelihood factor for the group being L3 = w 8 . (4) 12 lives first observed at 35 and survived the age 36. The conditional probability of the event is 1 − w and the factor is L4 = w 12 . Now note that up to a constant factor (which is irrelevant) the total likelihood is proportional to w 14 (1 − w)16 L= . (1 − .4w)10 Take its logarithm, the derivative is l′ = (14/w) − (16)/(1 − w) + 4/(1 − .4w). Set the derivative to zero, look at the numerator and get 14 − 31.6w + 8w 2 = 0. The solution which is less than 1 is w = q35 = .51. 7. Problem 15.30 Solution: Remember that for censored data the likelihood is the product of density functions at moments of observed deaths times products of survival functions at moments of censoring. As a result, our first step is to calculate underlying survival and density functions. According to page 17, if the hazard rate function h of a nonnegative RV is given then its survival function can be calculated as S(t) = e− Rt 0 h(u)du . The the pdf is f (t) = −S ′ (t) := −dS(t)/dt. Using these formulae we get S(t) = exp(−λ1 t)I(0 ≤ t < 2) + exp(−2λ1 − λ2 (t − 2))I(t ≥ 2), 4 and f (t) = λ1 exp(−λ1 t)I(o ≤ t < 2) + λ1 exp(−2λ1 − λ2 (t − 2))I(t ≥ 2). Then we calculate the likelihood function, L(λ1 , λ2 ) = λ1 exp(−λ1 (1.7))λ2 exp(−2λ1 − λ2 (3.3 − 2)) × exp(−λ1 (1.5)) exp(−2λ1 − λ2 (2.6 − 2)) exp(−2λ1 − λ2 (3.5 − 2)) = λ1 λ2 exp(−λ1 [1.7 + 1. + 6] − λ2 [1.3 + .6 + 1.5]) = λ1 λ2 exp(−9.2λ1 − 3.4λ2 ). Now we need to find values of the parameters λ1 and λ2 that maximize the likelihood. To do this it is convenient to take the logarithm and then set first order partial derivatives with respect to λ1 and λ2 equal to zero, and then solve the system of equations. Formally you then need to check that these are points of the maximum but if they are unique then typically they are the pints of maximum - so you can save time. Remark: Just in case, this is the rule how to check that a function h(x, y) attains a maximum at the point (a, b). (a) The first-order partial derivatives are zero, that is ∂h(x, y)/∂x|x=a,y=b = 0, and ∂h(x, y)/∂y|x=a,y=b = 0. (b) At least one second-order partial derivative is negative, that is ∂ 2 h(x, y)/∂x2 |x=a,y=b < 0, or ∂ 2 h(x, y)/∂y 2|x=a,y=b < 0. (c) The Jacobian of the second-order partial derivatives is positive, {[∂ 2 h(x, y)/∂x2 ][∂ 2 h(x, y)/∂y 2] − [∂ 2 h(x, y)/∂x∂y]2 }|x=a,y=b > 0. For the data at ta hand the log-likelihood function is l(λ1 , λ2 ) = ln(L(λ1 , λ2 )) = ln(λ1 ) + ln(λ2 ) − 9.2λ1 − 3.4λ2 . Take first-order partial derivatives and get the system ( λ−1 1 − 9.2 = 0 λ−1 2 − 3.4 = 0. Its solution gives us the Answer: λ̂1 = 1/9.2 = .11 and λ2 = 1/3.4 = .294. 8 Problem 15.33. Given: Loss X is Exponential(θ). Find E(X) = θ based on the given data. Solution: We have observations of the loss and 495 censorings at the value 4,000. The first 5 should be plugged in the density and all 495 censoring values into the survival function. Here f (t) = θ−1 e−t/θ and S(t) = e−t/θ . As a result, the likelihood function is L(θ) = θ−5 exp(−[1100 + 3200 + 3300 + 3500 + 3900]/θ) exp(−(495)(4000)/θ). 5 This yields the log-likelihood l(θ) = −5 ln(θ) − 1, 995, 000/θ. Take the derivative, set it to zero, find the solution: dl(θ)/dθ = −5/θ + 1, 995, 000/θ2 = 0 and the solution is θ̂ = 3999, 000. 9. Problem 15.35 Solution: Actuary X is dealing with 4 observed times and 1 censored at 5 years, actuary Y is dealing with 5 observed times. For the problem at hand the pdf is f (t) = −dS(t)/dt = w −1I(0 ≤ t ≤ w). Note that this is the uniform distribution where the parameter of interest is in the support of the density — remember that in this case MLE can be tricky and based on visualization of the MLE rather than taking a derivative! Well for the actuary X the likelihood is LX (w) = f (1)f (3)f (4)f (4)S(5) = w −4(1 − 5/w)I(w ≥ 5). (1) Please pay attention that the indicator function must be taken case about, and in (1) I used I(0 ≤ 1 ≤ w)I(0 ≤ 3 ≤ w)I(0 ≤ 4 ≤ w)I(0 ≤ 4 ≤ w)I(0 ≤ 5 ≤ w) = I(5 ≤ w). Next step is to understand that LX (w) is defined only for w ∈ [5, ∞), and we need to understand how this function looks like for these values of w. Set g(w) = w −4 (1 − 5/w) and note that its derivative is dg(w))/dw = −4w −5 + 25w −6. Set the derivative to zero and find the extreme point ŵ = 6.25. Note that it is greater 5, which is admissible for the support. This is also plainly the maximum point because g(5) = 0. As a result, the MLE for the actuary X is ŵX = 6.25. Actuary Y observes all times so for him the likelihood is LY (w) = f (1)f (3)f (4)f (4)f (6) = w −5 I(w ≥ 6). (2) This is an interesting case because w −5 is decreasing in w and this the maximum of the likelihood is attained at the point ŵY = 6. To see this more clearly graph the likelihood (2) as a function in w and see where the maximum is. The function is zero for w < 6 and then w −5 for w ≥ 6. 10. Problem 15.46. Solution: Here cdf F (x) = xp I(0 < x < 1) so the density is f (x) = pxp−1 I(0 < x < 1). Then the log-likelihood function is l(p) = n X [ln(p) + (p − 1) ln(xl )]. l=1 Take derivative, equate it to zero and get np−1 + The solution gives you the MLE p̂ = n X ln(xl ) = 0. l=1 Pn −n/ l=1 6 ln(xl ).