Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
On Predictive Modeling for Claim Severity Paper in Spring 2005 CAS Forum Glenn Meyers ISO Innovative Analytics Predictive Modeling Seminar September 19, 2005 Problems with Experience Rating for Excess of Loss Reinsurance • Use submission claim severity data – Relevant, but – Not credible – Not developed • Use industry distributions – Credible, but – Not relevant (???) General Problems with Fitting Claim Severity Distributions • Parameter uncertainty – Fitted parameters of chosen model are estimates subject to sampling error. • Model uncertainty – We might choose the wrong model. There is no particular reason that the models we choose are appropriate. • Loss development – Complete claim settlement data is not always available. Outline of Talk • Quantifying Parameter Uncertainty – Likelihood ratio test • Incorporating Model Uncertainty – Use Bayesian estimation with likelihood functions – Uncertainty in excess layer loss estimates • Bayesian estimation with prior models based on data reported to a statistical agent – Reflect insurer heterogeneity – Develops losses The Likelihood Ratio Test Let p ( p1,..., pk ) be a parameter vector for your chosen loss model. Let x ( x1,..., xn ) be a set of observed losses. Let pˆ be the maximum likelihood estimate of p given x. The Likelihood Ratio Test Test H0 :p p against H1 :p p * * Theorem 2.10 in Klugman, Panjer & Willmot If H0 is true then: ln LR 2 ln L pˆ; x ln L p * ; x has a 2 distribution with k degrees of freedom. Use distribution to find critical values. 2 An Example – The Pareto Distribution F(x) 1 x • Simulate random sample of size 1000 = 2.000, = 10,000 Maximum Likelihood = -10034.660 with ˆ 8723.04 ˆ 1.80792 Hypothesis Testing Example • Significance level = 5% 2 critical value = 5.991 • H0: (,) = (10000, 2) • H1: (,) ≠ (10000, 2) • lnLR = 2(-10034.660 + 10035.623) =1.207 • Accept H0 Hypothesis Testing Example • Significance level = 5% 2 critical value = 5.991 • H0: (,) = (10000, 1.7) • H1: (,) ≠ (10000, 1.7) • lnLR = 2(-10034.660 + 10045.975) =22.631 • Reject H0 Confidence Region • X% confidence region corresponds to the 1-X% level hypothesis test. • The set of all parameters (,) that fail to reject corresponding H0. • For the 95% confidence region: – (10000, 2.0) is in. – (10000, 1.7) out. Confidence Region Outer Ring 95%, Inner Ring 50% 2.5 Alpha 2.0 1.5 1.0 0.5 0.0 0 5000 10000 Theta 15000 Grouped Data • Data grouped into four intervals – 562 under 5000 – 181 between 5000 and 10000 – 134 between 10000 and 20000 – 123 over 20000 • Same data as before, only less information is given. Confidence Region for Grouped Data Outer Ring 95%, Inner Ring 50% 2.5 Alpha 2.0 1.5 1.0 0.5 0.0 0 5000 10000 Theta 15000 Confidence Region for Ungrouped Data Outer Ring 95%, Inner Ring 50% 2.5 Alpha 2.0 1.5 1.0 0.5 0.0 0 5000 10000 Theta 15000 Estimation with Model Uncertainty COTOR Challenge – November 2004 • COTOR published 250 claims – Distributional form not revealed to participants • Participants were challenged to estimate the cost of a $5M x $5M layer. • Estimate confidence interval for pure premium You want to fit a distribution to 250 Claims • Knee jerk first reaction, plot a histogram. Histogram of Cotor Data 250 200 Count 150 100 50 0 0 1 2 3 4 Claim Amount 5 6 7 6 x 10 This will not do! Take logs • And fit some standard distributions. 0.35 0.3 lcotor data 0.25 lognormal Density gamma 0.2 Weibull 0.15 0.1 0.05 0 6 7 8 9 10 11 12 Log of Claim Amounts 13 14 15 16 Still looks skewed. Take double logs. • And fit some standard distributions. llcotor data 2.5 Lognormal Gamma Weibull Density 2 1.5 1 0.5 0 1.8 2 2.2 2.4 log log of Claim Amounts 2.6 2.8 Still looks skewed. Take triple logs. • Still some skewness. • Lognormal and gamma fits look somewhat better. lllcotor data Lognormal 5 Gamma Normal Density 4 3 2 1 0 0.55 0.6 0.65 0.7 0.75 0.8 0.85 Triple log of Claim Amounts 0.9 0.95 1 Candidate #1 Quadruple lognormal Distribution: Log likelihood: Domain: Mean: Variance: Lognormal 283.496 0 < y < Inf 0.738351 0.006189 Parameter Mu sigma Estimate Std. Err. -0.30898 0.00672 0.106252 0.004766 Estimated covariance of parameter estimates: mu sigma Mu 4.52E-05 1.31E-19 Sigma 1.31E-19 2.27E-05 Candidate #2 Triple loggamma Distribution: Log likelihood: Domain: Mean: Variance: Parameter A B Gamma 282.621 0 < y < Inf 0.738355 0.00615 Estimate 88.6454 0.008329 Std. Err. 7.91382 0.000746 Estimated covariance of parameter estimates: a b A 62.6286 -0.00588 B -0.00588 5.56E-07 Candidate #3 Triple lognormal Distribution: Log likelihood: Domain: Mean: Variance: Parameter mu sigma Normal 279.461 -Inf < y < Inf 0.738355 0.006285 Estimate Std. Err. 0.738355 0.005014 0.079279 0.003556 Estimated covariance of parameter estimates: mu sigma mu 2.51E-05 -1.14E-19 sigma -1.14E-19 1.26E-05 All three cdf’s are within confidence interval for the quadruple lognormal. 1 0.9 0.8 Cumulative probability 0.7 0.6 0.5 0.4 lllcotor data 0.3 Lognormal confidence bounds (Lognormal) 0.2 Gamma Normal 0.1 0 0.55 0.6 0.65 0.7 0.75 0.8 0.85 Triple log of Claim Amounts 0.9 0.95 1 Elements of Solution • Three candidate models – Quadruple lognormal – Triple loggamma – Triple lognormal • Parameter uncertainty within each model • Construct a series of models consisting of – One of the three models. – Parameters within a broad confidence interval for each model. – 7803 possible models Steps in Solution • Calculate likelihood (given the data) for each model. • Use Bayes’ Theorem to calculate posterior probability for each model – Each model has equal prior probability. Posterior model|data Likelihood data|model Prior model Steps in Solution • Calculate layer pure premium for 5 x 5 layer for each model. • Expected pure premium is the posterior probability weighted average of the model layer pure premiums. • Second moment of pure premium is the posterior probability weighted average of the model layer pure premiums squared. CDF of Layer Pure Premium Probability that layer pure premium ≤ x equals Sum of posterior probabilities for which the model layer pure premium is ≤ x Numerical Results Mean Standard Deviation Median Range Low at 2.5% High at 97.5% 6,430 3,370 5,780 1,760 14,710 Histogram of Predictive Pure Premium Predictive Distribution of the Layer Pure Premium 0.16 0.14 0.12 Density 0.10 0.08 0.06 0.04 0.02 0.00 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 Low End of Amount (000) Example with Insurance Data • Continue with Bayesian Estimation • Liability insurance claim severity data • Prior distributions derived from models based on individual insurer data • Prior models reflect the maturity of claim data used in the estimation Initial Insurer Models • Selected 20 insurers – Claim count in the thousands • Fit mixed exponential distribution to the data of each insurer • Initial fits had volatile tails • Truncation issues – Do small claims predict likelihood of large claims? Initial Insurer Models 45,000 40,000 Limited Average Severity 35,000 30,000 25,000 20,000 15,000 10,000 5,000 0 1,000 10,000 100,000 Loss Amount - x 1,000,000 10,000,000 Low Truncation Point 5,000 4,500 500 x 500 Layer Average Severity 4,000 3,500 3,000 2,500 2,000 1,500 1,000 500 0 0.00 0.05 0.10 0.15 0.20 0.25 Probability That Loss is Over 5,000 0.30 0.35 0.40 High Truncation Point 5,000 4,500 500 x 500 Layer Average Severity 4,000 3,500 3,000 2,500 2,000 1,500 1,000 500 0 0.00 0.01 0.02 0.03 0.04 0.05 Probability That Loss is Over 100,000 0.06 0.07 Selections Made • Truncation point = $100,000 • Family of cdf’s that has “correct” behavior – Admittedly the definition of “correct” is debatable, but – The choices are transparent! Selected Insurer Models 45,000 40,000 Limited Average Severity 35,000 30,000 25,000 20,000 15,000 10,000 5,000 0 100,000 1,000,000 Loss Amount - x 10,000,000 Selected Insurer Models 6,000 500 x 500 Layer Average Severity 5,000 4,000 3,000 2,000 1,000 0 0.00 0.01 0.01 0.02 0.02 0.03 0.03 Probability That Loss is Over 100,000 0.04 0.04 0.05 Each model consists of 1. The claim severity distribution for all claims settled within 1 year 2. The claim severity distribution for all claims settled within 2 years 3. The claim severity distribution for all claims settled within 3 years 4. The ultimate claim severity distribution for all claims 5. The ultimate limited average severity curve Three Sample Insurers Small, Medium and Large • Each has three years of data • Calculate likelihood functions – Most recent year with #1 on prior slide – 2nd most recent year with #2 on prior slide – 3rd most recent year with #3 on prior slide • Use Bayes theorem to calculate posterior probability of each model Formulas for Posterior Probabilities Model (m) Cell Probabilities Pi ,AY ,m FAY ,m xi 1 FAY ,m xi 1 FAY ,m x1 Number of claims 9 Likelihood (m) 3 l m Pi ,AY ,m ni ,AY i 1 AY 1 Using Bayes’ Theorem Posterior(m) l m Prior(m) Exhibit 1 – Small Insurer Results Taken from paper. Lags 1 1 1 1 1 1 1 1 1 Interval Lower Bound 100,000 200,000 300,000 400,000 500,000 750,000 1,000,000 1,500,000 2,000,000 Claim Count 15 2 1 2 0 0 0 0 0 1-2 1-2 1-2 1-2 1-2 1-2 1-2 1-2 1-2 100,000 200,000 300,000 400,000 500,000 750,000 1,000,000 1,500,000 2,000,000 40 10 1 0 2 0 2 0 0 1-3 1-3 1-3 1-3 1-3 1-3 1-3 1-3 1-3 100,000 200,000 300,000 400,000 500,000 750,000 1,000,000 1,500,000 2,000,000 76 26 11 3 8 0 0 0 0 Layer Pure Premium Prior Posterior $500K x $1M x Model # Probability $500K $1M 1 0.016406 763 541 2 0.041658 911 645 3 0.089063 1,153 682 4 0.130281 1,224 796 5 0.157593 1,281 912 6 0.110614 1,390 978 7 0.075702 1,494 1,040 8 0.053226 1,587 1,095 9 0.080525 1,849 1,328 10 0.104056 2,069 1,523 11 0.129925 2,417 1,828 12 0.010896 2,598 1,916 13 0.000007 2,788 1,922 14 0.000009 3,004 2,124 15 0.000011 3,202 2,309 16 0.000013 3,382 2,477 17 0.000014 3,543 2,628 18 0 4,058 3,211 19 0 4,663 3,784 20 0 5,354 4,440 Posterior Mean Posterior Std. Dev. 1,572 463 1,113 385 Formulas for Ultimate Layer Pure Premium • Use #5 on model (3rd previous) slide to calculate ultimate layer pure premium 20 Posterior Mean = Layer Pure Premium(m) Posterior(m). m =1 Posterior Standard Deviation = 20 2 2 Layer Pure Premium( m ) Posterior( m ) Posterior Mean m =1 Results Posterior Mean Posterior Std. Dev. Small Insurer Medium Insurer Large Insurer Layer Pure Premium Layer Pure Premium Layer Pure Premium $500K x $1M x $500K x $1M x $500K x $1M x $500K $1M $500K $1M $500K $1M 1,572 1,113 1,344 909 1,360 966 463 385 278 245 234 188 • All insurers were simulated from same population. • Posterior standard deviation decreases with insurer size. Possible Extensions • Obtain model for individual insurers • Obtain data for insurer of interest • Calculate likelihood, Pr{data|model}, for each insurer’s model. • Use Bayes’ Theorem to calculate posterior probability of each model • Calculate the statistic of choice using models and posterior probabilities – e.g. Loss reserves