Download On Predictive Modeling for Claim Severity

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
On Predictive Modeling for
Claim Severity
Paper in Spring 2005 CAS Forum
Glenn Meyers
ISO Innovative Analytics
Predictive Modeling Seminar
September 19, 2005
Problems with
Experience Rating for
Excess of Loss Reinsurance
• Use submission claim severity data
– Relevant, but
– Not credible
– Not developed
• Use industry distributions
– Credible, but
– Not relevant (???)
General Problems with
Fitting Claim Severity Distributions
• Parameter uncertainty
– Fitted parameters of chosen model are
estimates subject to sampling error.
• Model uncertainty
– We might choose the wrong model. There is
no particular reason that the models we
choose are appropriate.
• Loss development
– Complete claim settlement data is not always
available.
Outline of Talk
• Quantifying Parameter Uncertainty
– Likelihood ratio test
• Incorporating Model Uncertainty
– Use Bayesian estimation with likelihood
functions
– Uncertainty in excess layer loss estimates
• Bayesian estimation with prior models
based on data reported to a statistical
agent
– Reflect insurer heterogeneity
– Develops losses
The Likelihood Ratio Test
Let p  ( p1,..., pk ) be a parameter vector
for your chosen loss model.
Let x  ( x1,..., xn ) be a set of observed
losses.
Let pˆ be the maximum likelihood
estimate of p given x.
The Likelihood Ratio Test
Test H0 :p  p against H1 :p  p
*
*
Theorem 2.10 in Klugman, Panjer & Willmot
If H0 is true then:


ln LR  2 ln L  pˆ; x   ln L p * ; x 
has a  2 distribution with k degrees
of freedom.
Use  distribution to find critical values.
2
An Example – The Pareto Distribution

  
F(x)  1 

 x  
• Simulate random sample of size 1000
 = 2.000,  = 10,000
Maximum Likelihood = -10034.660 with
ˆ  8723.04
ˆ  1.80792
Hypothesis Testing Example
• Significance level = 5%
2 critical value = 5.991
• H0: (,) = (10000, 2)
• H1: (,) ≠ (10000, 2)
• lnLR = 2(-10034.660 + 10035.623) =1.207
• Accept H0
Hypothesis Testing Example
• Significance level = 5%
2 critical value = 5.991
• H0: (,) = (10000, 1.7)
• H1: (,) ≠ (10000, 1.7)
• lnLR = 2(-10034.660 + 10045.975) =22.631
• Reject H0
Confidence Region
• X% confidence region corresponds to the
1-X% level hypothesis test.
• The set of all parameters (,) that fail to
reject corresponding H0.
• For the 95% confidence region:
– (10000, 2.0) is in.
– (10000, 1.7) out.
Confidence Region
Outer Ring 95%,
Inner Ring 50%
2.5
Alpha
2.0
1.5
1.0
0.5
0.0
0
5000
10000
Theta
15000
Grouped Data
• Data grouped into four intervals
– 562 under 5000
– 181 between 5000 and 10000
– 134 between 10000 and 20000
– 123 over 20000
• Same data as before, only less information
is given.
Confidence Region for
Grouped Data
Outer Ring 95%,
Inner Ring 50%
2.5
Alpha
2.0
1.5
1.0
0.5
0.0
0
5000
10000
Theta
15000
Confidence Region for
Ungrouped Data
Outer Ring 95%,
Inner Ring 50%
2.5
Alpha
2.0
1.5
1.0
0.5
0.0
0
5000
10000
Theta
15000
Estimation with Model Uncertainty
COTOR Challenge – November 2004
• COTOR published 250 claims
– Distributional form not revealed to participants
• Participants were challenged to estimate
the cost of a $5M x $5M layer.
• Estimate confidence interval for pure
premium
You want to fit a
distribution to 250 Claims
• Knee jerk first reaction, plot a histogram.
Histogram of Cotor Data
250
200
Count
150
100
50
0
0
1
2
3
4
Claim Amount
5
6
7
6
x 10
This will not do! Take logs
• And fit some standard distributions.
0.35
0.3
lcotor data
0.25
lognormal
Density
gamma
0.2
Weibull
0.15
0.1
0.05
0
6
7
8
9
10
11
12
Log of Claim Amounts
13
14
15
16
Still looks skewed. Take double logs.
• And fit some standard distributions.
llcotor data
2.5
Lognormal
Gamma
Weibull
Density
2
1.5
1
0.5
0
1.8
2
2.2
2.4
log log of Claim Amounts
2.6
2.8
Still looks skewed. Take triple logs.
• Still some skewness.
• Lognormal and gamma fits look somewhat better.
lllcotor data
Lognormal
5
Gamma
Normal
Density
4
3
2
1
0
0.55
0.6
0.65
0.7
0.75
0.8
0.85
Triple log of Claim Amounts
0.9
0.95
1
Candidate #1
Quadruple lognormal
Distribution:
Log likelihood:
Domain:
Mean:
Variance:
Lognormal
283.496
0 < y < Inf
0.738351
0.006189
Parameter
Mu
sigma
Estimate
Std. Err.
-0.30898
0.00672
0.106252
0.004766
Estimated covariance of parameter estimates:
mu
sigma
Mu
4.52E-05
1.31E-19
Sigma
1.31E-19
2.27E-05
Candidate #2
Triple loggamma
Distribution:
Log likelihood:
Domain:
Mean:
Variance:
Parameter
A
B
Gamma
282.621
0 < y < Inf
0.738355
0.00615
Estimate
88.6454
0.008329
Std. Err.
7.91382
0.000746
Estimated covariance of parameter estimates:
a
b
A
62.6286
-0.00588
B
-0.00588
5.56E-07
Candidate #3
Triple lognormal
Distribution:
Log likelihood:
Domain:
Mean:
Variance:
Parameter
mu
sigma
Normal
279.461
-Inf < y < Inf
0.738355
0.006285
Estimate Std. Err.
0.738355
0.005014
0.079279
0.003556
Estimated covariance of parameter estimates:
mu
sigma
mu
2.51E-05
-1.14E-19
sigma
-1.14E-19
1.26E-05
All three cdf’s are within confidence
interval for the quadruple lognormal.
1
0.9
0.8
Cumulative probability
0.7
0.6
0.5
0.4
lllcotor data
0.3
Lognormal
confidence bounds (Lognormal)
0.2
Gamma
Normal
0.1
0
0.55
0.6
0.65
0.7
0.75
0.8
0.85
Triple log of Claim Amounts
0.9
0.95
1
Elements of Solution
• Three candidate models
– Quadruple lognormal
– Triple loggamma
– Triple lognormal
• Parameter uncertainty within each model
• Construct a series of models consisting of
– One of the three models.
– Parameters within a broad confidence interval
for each model.
– 7803 possible models
Steps in Solution
• Calculate likelihood (given the data) for each
model.
• Use Bayes’ Theorem to calculate posterior
probability for each model
– Each model has equal prior probability.
Posterior model|data   Likelihood  data|model   Prior model
Steps in Solution
• Calculate layer pure premium for 5 x 5
layer for each model.
• Expected pure premium is the posterior
probability weighted average of the model
layer pure premiums.
• Second moment of pure premium is the
posterior probability weighted average of
the model layer pure premiums squared.
CDF of Layer Pure Premium
Probability that layer pure premium ≤ x
equals
Sum of posterior probabilities for which the
model layer pure premium is ≤ x
Numerical Results
Mean
Standard Deviation
Median
Range
Low at 2.5%
High at 97.5%
6,430
3,370
5,780
1,760
14,710
Histogram of
Predictive Pure Premium
Predictive Distribution of the Layer Pure Premium
0.16
0.14
0.12
Density
0.10
0.08
0.06
0.04
0.02
0.00
0
1
2
3
4
5
6
7
8
9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
Low End of Amount (000)
Example with Insurance Data
• Continue with Bayesian Estimation
• Liability insurance claim severity data
• Prior distributions derived from models
based on individual insurer data
• Prior models reflect the maturity of claim
data used in the estimation
Initial Insurer Models
• Selected 20 insurers
– Claim count in the thousands
• Fit mixed exponential distribution to the
data of each insurer
• Initial fits had volatile tails
• Truncation issues
– Do small claims predict likelihood of large
claims?
Initial Insurer Models
45,000
40,000
Limited Average Severity
35,000
30,000
25,000
20,000
15,000
10,000
5,000
0
1,000
10,000
100,000
Loss Amount - x
1,000,000
10,000,000
Low Truncation Point
5,000
4,500
500 x 500 Layer Average Severity
4,000
3,500
3,000
2,500
2,000
1,500
1,000
500
0
0.00
0.05
0.10
0.15
0.20
0.25
Probability That Loss is Over 5,000
0.30
0.35
0.40
High Truncation Point
5,000
4,500
500 x 500 Layer Average Severity
4,000
3,500
3,000
2,500
2,000
1,500
1,000
500
0
0.00
0.01
0.02
0.03
0.04
0.05
Probability That Loss is Over 100,000
0.06
0.07
Selections Made
• Truncation point = $100,000
• Family of cdf’s that has “correct” behavior
– Admittedly the definition of “correct” is
debatable, but
– The choices are transparent!
Selected Insurer Models
45,000
40,000
Limited Average Severity
35,000
30,000
25,000
20,000
15,000
10,000
5,000
0
100,000
1,000,000
Loss Amount - x
10,000,000
Selected Insurer Models
6,000
500 x 500 Layer Average Severity
5,000
4,000
3,000
2,000
1,000
0
0.00
0.01
0.01
0.02
0.02
0.03
0.03
Probability That Loss is Over 100,000
0.04
0.04
0.05
Each model consists of
1. The claim severity distribution for all claims
settled within 1 year
2. The claim severity distribution for all claims
settled within 2 years
3. The claim severity distribution for all claims
settled within 3 years
4. The ultimate claim severity distribution for all
claims
5. The ultimate limited average severity curve
Three Sample Insurers
Small, Medium and Large
• Each has three years of data
• Calculate likelihood functions
– Most recent year with #1 on prior slide
– 2nd most recent year with #2 on prior slide
– 3rd most recent year with #3 on prior slide
• Use Bayes theorem to calculate posterior
probability of each model
Formulas for Posterior Probabilities
Model (m) Cell
Probabilities
Pi ,AY ,m
FAY ,m  xi 1   FAY ,m  xi 

1  FAY ,m  x1 
Number of
claims
9
Likelihood (m)
3
l m     Pi ,AY ,m 
ni ,AY
i 1 AY 1
Using Bayes’
Theorem
Posterior(m)  l m  Prior(m)
Exhibit 1 – Small Insurer
Results
Taken from
paper.
Lags
1
1
1
1
1
1
1
1
1
Interval
Lower
Bound
100,000
200,000
300,000
400,000
500,000
750,000
1,000,000
1,500,000
2,000,000
Claim
Count
15
2
1
2
0
0
0
0
0
1-2
1-2
1-2
1-2
1-2
1-2
1-2
1-2
1-2
100,000
200,000
300,000
400,000
500,000
750,000
1,000,000
1,500,000
2,000,000
40
10
1
0
2
0
2
0
0
1-3
1-3
1-3
1-3
1-3
1-3
1-3
1-3
1-3
100,000
200,000
300,000
400,000
500,000
750,000
1,000,000
1,500,000
2,000,000
76
26
11
3
8
0
0
0
0
Layer Pure Premium
Prior
Posterior $500K x $1M x
Model # Probability $500K
$1M
1
0.016406
763
541
2
0.041658
911
645
3
0.089063
1,153
682
4
0.130281
1,224
796
5
0.157593
1,281
912
6
0.110614
1,390
978
7
0.075702
1,494
1,040
8
0.053226
1,587
1,095
9
0.080525
1,849
1,328
10
0.104056
2,069
1,523
11
0.129925
2,417
1,828
12
0.010896
2,598
1,916
13
0.000007
2,788
1,922
14
0.000009
3,004
2,124
15
0.000011
3,202
2,309
16
0.000013
3,382
2,477
17
0.000014
3,543
2,628
18
0
4,058
3,211
19
0
4,663
3,784
20
0
5,354
4,440
Posterior Mean
Posterior Std. Dev.
1,572
463
1,113
385
Formulas for
Ultimate Layer Pure Premium
• Use #5 on model (3rd previous) slide to
calculate ultimate layer pure premium
20
Posterior Mean =
 Layer Pure Premium(m)  Posterior(m).
m =1
Posterior Standard Deviation =
20
2
2
Layer
Pure
Premium(
m
)

Posterior(
m
)

Posterior
Mean

m =1
Results
Posterior Mean
Posterior Std. Dev.
Small Insurer
Medium Insurer
Large Insurer
Layer Pure Premium Layer Pure Premium Layer Pure Premium
$500K x
$1M x
$500K x
$1M x
$500K x
$1M x
$500K
$1M
$500K
$1M
$500K
$1M
1,572
1,113
1,344
909
1,360
966
463
385
278
245
234
188
• All insurers were simulated from same
population.
• Posterior standard deviation decreases
with insurer size.
Possible Extensions
• Obtain model for individual insurers
• Obtain data for insurer of interest
• Calculate likelihood, Pr{data|model}, for
each insurer’s model.
• Use Bayes’ Theorem to calculate posterior
probability of each model
• Calculate the statistic of choice using
models and posterior probabilities
– e.g. Loss reserves
Related documents