Download Theoretical and practical aspects of parameter and model uncertainty

Document related concepts

History of statistics wikipedia , lookup

Statistics wikipedia , lookup

Transcript
36th ANNUAL GIRO CONVENTION
Theoretical and Practical Aspects of
Parameter and Model Uncertainty
Edinburgh October 2009
Dimitris Papachristou
AON BENFIELD
Topics
ƒ Sources of parameter uncertainty
ƒ Diversifiable and non diversifiable parameter risk
ƒ Effect of model/parameter uncertainty on a single risk and on a
portfolio of risks
ƒ Methods of estimating the parameter uncertainty: analytical, Monte
Carlo, asymptotic approximations
ƒ Pitfalls and commonly used methods
ƒ Model uncertainty and common actuarial distributions
ƒ Applications: frequency of aviation losses, risk transfer assessment
Sources of Uncertainty of Some Risk Quantity
ƒ Process Risk: due to the stochastic nature of
the risk given a statistical model and its
parameters
ƒ Parameter Uncertainty: uncertainty in the
values of the parameters of the statistical model
ƒ Model Uncertainty: uncertainty in the choice of
the statistical model
Types of Parameter and Model Uncertainty
The nature of the parameter and model risk is similar
ƒ
ƒ
statistical uncertainty due to the limited amount of data
uncertainty because of heterogeneity in the portfolio.
ƒ
ƒ
ƒ
For example a portfolio of motor policyholders where there is
a mix of drivers of different skills
uncertainty because of change in the parameters over
time
Parameter uncertainty because of errors in the data or
uncertainties in our estimates of the losses,
ƒ
although it is an important source of uncertainty, it is not
examined in this paper.
Diversifiable and non diversifiable
parameter risk
Three Experiments - Nature of Parameter Uncertainty
ƒ 1st Experiment: Fair coin with known p=0.5
ƒ 2nd Experiment: Several coins, each has a
different p and p~U(0,1)
ƒ 3rd Experiment: One coin, p~U(0,1), same coin
is used for several throws
Three Experiments Nature of Parameter Uncertainty
ƒ 1st Experiment: Fair coin with p=0.5. Insurance
analogy:
ƒ we have sufficient information to assess the parameters of the
risk with reasonable accuracy,
ƒ or simply a decision has been made to ignore parameter risk
ƒ 2nd Experiment: Several coins, each set of trials has a
different p and p~U(0,1). Insurance analogy:
ƒ a heterogeneous portfolio of policies.
ƒ e.g. a portfolio of motor policies where the drivers have varying
driving skills and for each of the drivers the risk parameter is
different
ƒ 3rd Experiment: One coin, p~U(0,1). Insurance analogy:
ƒ limited information to assess the parameters of a single type of
risk
Three Experiments Nature of Parameter Uncertainty
ƒ 1st Experiment: Fair coin with p=0.5
⎛n⎞
P[ N = k ] = ⎜⎜ ⎟⎟ p k (1 − p) n − k
⎝k ⎠
ƒ 2nd Experiment: Several coins, each set of trials has a
different p and p~U(0,1)
⎛n⎞
P[ N = k ] = ∫ ⎜⎜ ⎟⎟ p (1 − p) dp
⎝k ⎠
1
k
n−k
0
ƒ 3rd Experiment: One coin, p~U(0,1), several throws
ƒ In the third experiment I would prefer to look at the probability of
heads as a random variable
⎛n⎞
P[ N = k ] = ⎜⎜ ⎟⎟ p k (1 − p) n − k
⎝k ⎠
which is a function of the random variable p, with p ~ U (0, 1)
.
Parameter uncertainty due to limited
amount of data
ƒ The process risk can be diversified over time or over a
portfolio of similar and independent policies.
ƒ the degree of diversification may be limited by the time horizon of the
risk taker or the number of policies in the portfolio
ƒ The parameter and model risk for a single type of policy
can not always be diversified over time. The model and
its parameters are fixed but unknown.
ƒ The belief about the value of the parameters could take the form of a
prior distribution or combined with some available data the form of a
posterior distribution.
ƒ In practice we do not have a purely homogenous portfolio (single type)
of policies
ƒ As time passes more data is collected and the parameter estimates
and parameter distributions are updated
Parameter uncertainty due to limited
amount of data
ƒ Summary descriptions of the risk such as the mean make more
sense when the risk can be diversified.
ƒ The mean of the process risk distribution shows the expected loss
over time or over a very large portfolio of identical and independent
policies.
ƒ The mean of the prior or the posterior distribution for the risk
parameters can always be calculated but its meaning may not
necessarily have an intuitive practical interpretation.
ƒ When insurance practitioners, especially practitioners not
mathematically trained, refer to the “1 in 10 years event” it is
doubtful whether they refer to both the space of the possible loss
outcomes and the space of the possible parameter and model
values.
Parameter uncertainty due to limited
amount of data
ƒ The discussion touches on the old debate
between “frequentists” and “Bayesians”
ƒ I accept both philosophical interpretations of
probability and their applications, but I
distinguish in risk applications between
probability as an opinion and probability as
frequency of experiment.
Comparison: Mixture Distribution – Fixed Known
Parameter – Fixed but Unknown Parameter
ƒ First Case: Fixed and known Parameter
ƒ Let assume that X is a random variable following
anExponentia l (λ )distribution where the parameter λ is
fixed and known.
ƒ F ( x; λ ) = 1 − e − λx , λ , x > 0
ƒ A p-percentile is given by
h ( p; λ ) =
− ln(1 − p )
λ
Comparison: Mixture Distribution – Fixed Known
Parameter – Fixed but Unknown Parameter
ƒ Second Case: Heterogeneous Portfolio – Mixture
Distribution
ƒ Now let assume that X | λ ~ Exponentia l (λ ) and λ is a
random variable which follows a Gamma (a, b)
distribution. The unconditional distribution function
of X is
a
x∞
a a −1 − bλ
F ( x; a, b) = ∫ ∫ λe −λy
0 0
b λ e
Γ(a )
⎛ b ⎞
dλ ⋅ dy = 1 − ⎜
⎟ , x, a , b > 0
⎝b+ x⎠
ƒ A p-percentile is given by
h( p; a, b) =
b
(1 − p )
1/ a
−b
Comparison: Mixture Distribution – Fixed Known
Parameter – Fixed but Unknown Parameter
ƒ Third Case: Single Policy with Unknown
Parameter
− ln(1 − p )
h
(
p
;
)
λ
=
ƒ Here the percentile
is a random
λ
variable because λ is a random variable which
follows a Gamma (a, b) distribution.
ƒ Although the mean of this random variable does
not have a clear intuitive interpretation, someone
− ln(1 − p )
could estimate the mean of h( p; λ ) = λ
∞
− ln(1 − p) b a λa −1e − bλ
− ln(1 − p)
⎡ − ln(1 − p) ⎤
λ
E[h( p; λ )] = E ⎢
=
d
=
⎥ ∫
λ
λ
Γ( a )
b(a − 1)
⎦ 0
⎣
a >1
Comparison: Mixture Distribution – Fixed Known
Parameter – Fixed but Unknown Parameter
ƒ Numerical Example p = 0.95
ƒ If λ = 1 when it is fixed and known
ƒ Ifλ follows a Gamma (a = 1.1, b = 1.1) distribution when
assumed to be a r.v. with E[λ ] = 1
ƒ First case:
ƒ Second
h ( p; λ ) =
− ln(1 − p )
case: h( p; a, b) =
ƒ Third case:
=3.00,
λ
b
(1 − p )
1/ a
− b =15.66
− ln(1 − p )
E [h( p; λ )] =
b(a − 1)
and
=27.23
ƒ The results differ significantly in these three cases
ƒ The ordering is not always the same
Simulation: Mixture Distribution - Fixed but Unknown Parameter
ƒ
Simulation Method 1:
ƒ
A parameter is simulated and then a loss is simulated, given the simulated parameter. This is
repeated many times. The 95th percentile of the empirical simulated distribution is the estimate
for the 95th percentile.
ƒ
Simulation Method 2:
ƒ
A parameter is simulated and then the 95th percentile of the distribution given the simulated
parameter is calculated. Repeating this process a number of times an estimate of the distribution
of the 95th percentile can be obtained
ƒ
The first method is more appropriate for the case where we have a heterogeneous
portfolio of policies.
ƒ
The second method is more appropriate when we have a single type of risk and the
parameter is considered to be a random variable.
ƒ
the first method tends to be more common irrespective of whether one policy or a
portfolio of policies with different parameters is examined.
Portfolio of Policies
ƒ Heterogeneous portfolio
ƒ e.g frequency of motor accidents follows P (λ ) and λ
is different for each driver and follows Gamma ( a, b)
ƒ Generally holds
E[ X ] = E[ E[ X | θ ]] V [ X ] = V [ E[ X | θ ]] + E[V [ X | θ ]]
ƒ For a heterogeneous portfolio the process and parameter risk
can be diversified by increasing the number of independent
policies
ƒ In practice there may be some dependency between the policies
but this will not be due to the different risk parameters
ƒ Homogeneous portfolio
Portfolio of Policies
ƒ e.g. lives of identical risk whose probability of death depends on a life
table. The parameters of the table are subject to estimation error
ƒ The process risk can be diversified by increasing the number of policies,
ƒ But the parameter risk can not be diversified by increasing the number of
identical policies
ƒ my preferred treatment of quantities such as percentiles of the loss to the
portfolio is to consider these quantities as random variables and present
their distribution
ƒ However, some may want to treat the space of the values of the
parameters in the same way they would treat the space of the loss
outcomes
ƒ Alternatively we could consider groups of large portfolios each with each own
parameter
Portfolio of Policies
∞
ƒ Result 1 : If { X j | Θ} j =1
are independent and identically distributed
random variables then it can be shown that
Cov ( X i , X j ) = V [ E[ X i| Θ]]
ƒ This is an interesting result on its own when correlation is modelled
through a common parameter
{ X j | Θ}∞j =1 are independent and identically distributed
ƒ Result 2: If
random variables then
⎡n
⎤
V ⎢∑ X i ⎥ = n ⋅ V [ X i ] + n ⋅ (n − 1)Cov ( X i , X j )
⎣ i =1 ⎦
⎡ n
⎤
V ⎢∑ X i ⎥
⎣ i =1 ⎦
n→∞
→ Cov( X i , X j ) = V [ E[ X i | Θ]]
n
ƒ The standard deviation per policy does not go to 0 as the number of
policies goes to infinity. The parameter risk can not be diversified
Portfolio of Policies
ƒ Numerical example
ƒ e.g frequency of motor accidents follows
P (λ )
and λ ~ Gamma (a = 4, b = 12)
number of policies in
the portfolio
1
10
100
1,000
10,000
infinite number
standard deviation per policy
same lamda for
different lamda for
each policy
different policies
0.6009
0.6009
0.2472
0.1900
0.1764
0.0601
0.1677
0.0190
0.1668
0.0060
0.1667
0.0000
Application 1: Risk Transfer
ƒ Probability of reinsurer’s negative result
ƒ
ƒ
ƒ
ƒ
ƒ
No specified threshold
sometimes 10% probability of 10% Loss
(PV of Losses – PV of Premiums)/PV Premiums
Includes Profit Commissions and Additional Premiums
Only cash-flows between reinsurer and reinsured considered
ƒ Expected Reinsurer’s Deficit (ERD)
ƒ the conditional expectation of Reinsurer’s Loss is considered
ƒ no standard threshold - usually around 3%
ƒ Wang Transformation
ƒ Van Slykes – Kreps Approach
ƒ Other
Application 1: Risk Transfer
ƒ
Report to the Casualty Actuarial Task Force of the NAIC
ƒ
ƒ
ƒ
Common method:
ƒ
ƒ
ƒ
ƒ
about 37% of the respondents in their survey takes into account parameter
uncertainty when calculating risk transfer.
emphasises the importance of parameter uncertainty in the risk transfer
calculations
simulate the parameters and then simulating losses using the simulated
parameters.
for each simulation the risk transfer measurement is calculated and some
statistics are calculated.
Casualty Actuarial Task Force uses implicitly in one of the example of their
report.
The assessment of risk transfer is made on an individual contract basis.
ƒ
ƒ
ƒ
This means that the parameter risk can not be diversified.
whatever risk transfer measure is used, it is a random variable whose value
depends on the unknown parameters.
E.g. the ERD is not a single number as it is commonly perceived to be, but a
random variable with a distribution which depends on the distribution of the
parameters.
Application 1: Risk Transfer
ƒ
Example Contract
ƒ
ƒ
ƒ
ƒ
ƒ
ƒ
ƒ
ƒ
Term
Annual Limit
Term Limit
Annual Deductible
Premium
Reinsurer’s margin
Additional Premium
Profit Commission
Losses, 0)
3 years
15
30
30
5
30% of premium
25% of annual losses
Maximum of (Premium – Margin –
Application 1: Risk Transfer
ƒ
ƒ
ƒ
ƒ
Losses to the contract are assumed to follow a
Pareto(c=10 and a=1.2).
The parameter is assumed to have been estimated
based on 10 losses
For simplicity the discount rate is assumed to be
0%.
If the simulations are done in the common way
then the ERD is approximately 3.5%.
Application 1: Risk Transfer
ƒ
ƒ
The process risk can be diversified over time.
The parameter risk in this case can not be diversified.
ƒ
The ERD is not a single number but a random variable.
100%
90%
Cumulative Distribution
80%
70%
60%
50%
40%
30%
20%
10%
0%
0%
1%
2%
3%
4%
5%
6%
7%
8%
9%
10%
11%
12%
13%
14%
15%
16%
ERD
ƒ
There is a non negligible probability that the ERD takes values
significantly higher than the 3.5% value which is estimated with the
commonly used method of calculation.
Application 1: Risk Transfer
ƒ
ƒ
Someone could use the average of the ERD distribution as a measure
of risk transfer which in this case is 3.7%
However, a summary number does not necessarily adequately present
the amount of risk.
ƒ
ƒ
I have seen reinsurance transactions with
ƒ
ƒ
ƒ
ƒ
It would be similar to trying to base decisions about the value of an excess
of loss reinsurance programme on the expected loss rather than on the
whole distribution of the loss.
a genuine risk transfer motivation and a significant loss uncertainty such that
several reinsurers did not accept the risk.
parameter uncertainty being one of the main sources of uncertainty.
However, the usual actuarial simulation methods of averaging over the
space of parameters estimated risk transfer measures just above the
usual benchmarks.
The averaging over the space of parameter values failed to show one of
the main drivers of risk and one of the primary motivations for the
transaction.
Likelihood function and estimation
Bias in MLE estimates
Example: Pareto
ƒ MLE is asymptotically unbiased, but unfortunately our
world is not asymptotic
a
ƒ Consider
⎛c⎞
F ( x) = 1 − ⎜ ⎟
⎝ x⎠
0 < c < x, a > 0
ƒ In practice it is usually capped
ƒ The lower the value of parameter a , the “fatter” the tail
ƒ Data x ,..., x
1
n
n
ƒ MLE estimate
aˆ =
⎛ xi ⎞
ln⎜ ⎟
∑
⎝c⎠
i =1
n
Bias in MLE estimates
Example: Pareto
ƒ If the true parameter is a0 it can be shown that the
distribution of the MLE estimator has p.d.f.
(
n ⋅ a0 ) e
f (a) =
n
⎛ na ⎞
−⎜ 0 ⎟
⎝ a ⎠
a n +1Γ(n)
ƒ It can be shown that
n ⋅ a0
E[a ] =
n −1
ƒ …which on average is higher than a0. (Lighter tail)
ƒ Also it is more likely the estimate to be higher than a0
Bias in MLE estimates
Example: Pareto
ƒ Assume true parameter is 1.2
ƒ Both median and mean of estimator are higher
than 1.2
3.5
Probability Density Function
3
2.5
2
10 points
20 points
100 points
1.5
1
0.5
0
0
0.2
0.4
0.6
0.8
1
1.2
1.4
1.6
a
1.8
2
2.2
2.4
2.6
2.8
3
A “Paradox”
ƒ Consider the p x 100-th percentile
ƒ If a is our estimate for the parameter, then the p100-th
1
−
percentile is
a
Y = c ⋅ (1 − p )
ƒ It can be shown that under certain conditions
⎛
⎞
n ⋅a 0
⎜
⎟⎟
E[Y ] = c⎜
⎝ n ⋅ a0 + ln(1 − p ) ⎠
n
ƒ “Paradox”: Although on average our estimate of the
parameter is higher (lighter tail) than its true value, on
average the percentiles are higher than their true value
ƒ However, it is more likely to underestimate the percentile (as
it is more likely to overestimate the value of the parameter)
Distribution of the 95th percentile
ƒ Assuming c=10, a=1.2 and n=10
100%
90%
true 95th
percentile
80%
Cumulative Disbution
70%
60%
50%
40%
average 95th
percentile
30%
20%
10%
0%
0
200
400
600
800
1000
1200
95th percentile
ƒ Unbiased parameter estimator does not
necessarily imply unbiased percentile estimator
Likelihood Function
ƒ Same example as before with Pareto sample
ƒ The likelihood function is
L(a ) = a
n
n
∏x
− ( a +1)
i
= a ⋅e
− ( a +1)
n
i =1
ƒ Which has the form of a
−a
n
a e
f (a) =
n
∑ ln( xi )
i =1
n
(∑ ln( xi )) n +1
i =1
Γ( a )
n
∑ ln( xi )
i =1
∝a e
n
⎛
Gamma⎜ n + 1,
⎝
−a
n
∑ ln( xi )
i =1
⎞
ln( xi ) ⎟
∑
i =1
⎠
n
Likelihood Function - Bootstrap
ƒ Compare
ƒ If the true parameter is a0 it was shown that the distribution of the MLE
estimator has p.d.f.
n
(
n ⋅ a0 ) e
f (a) =
⎛ na ⎞
−⎜ 0 ⎟
⎝ a ⎠
a n +1Γ(n)
ƒ The likelihood function is
−a
n
a e
f (a) =
n
∑ ln( xi )
i =1
n
(∑ ln( xi )) n +1
i =1
Γ( a )
ƒ The two are different. The first is an inverse gamma distribution, the
second a gamma distribution. In the first case the parameter is given, in
the second the sample is given.
Likelihood Function - Bootstrap
ƒ Common practice
ƒ We estimate the parameter
ƒ We assume it is the correct one and we simulate data samples based on the
estimated parameter
ƒ For each sample we estimate the parameter
ƒ We construct the distribution of the parameter
ƒ This is not strictly correct because we assume that the parameter
is given, while what is given is the sample
0.9
0.8
Probability Density Function
0.7
0.6
0.5
estimator distribution (1st case)
likelihood function (2nd case)
0.4
0.3
0.2
0.1
0
0
0.2
0.4
0.6
0.8
1
1.2
1.4
1.6
1.8
2
2.2
2.4
Pareto Parameter a
2.6
2.8
3
3.2
3.4
3.6
3.8
4
Numerical Example
ƒ Assume Pareto(c=1,a=1.2) and simulated data
1.215
1.383
2.203
1.360
1.171
1.304
5.511
1.604
4.409
1.725
ƒ Likelihood estimate is a=1.6. (if true a=1.2, there is a probability of
around 20% that likelihood estimate will be greater than a=1.6)
1
ƒ The true 95th percentile is
12.139.
−
1.2
(1 − 0.95)
=
ƒ The 95th percentile based on the maximum likelihood estimate for is
1
−
6.503
(1 − 0.95) 1.6 =
ƒ The expected value of the 95th percentile when allowing for parameter
uncertainty is
−1
⎡
⎤
a
E0.95 [Y ] = E ⎢(1 − 0.95) ⎥ ≅ 8.851
⎣
⎦
ƒ Allowing for parameter uncertainty does not necessarily results in
percentile estimates which are close to the true ones
Calculation and Estimation Methods
of the Parameter Distribution
Calculation of Parameter Distribution
ƒ Exact distribution has a “nice” known analytical form
ƒ Normal approximation
ƒ Monte Carlo methods
Asymptotic v. Actual Distribution of
Weibull Parameter Estimator based on 11 data points
Asymptotic v. Actual Distribution of
Weibull Parameter Estimators
11 points
20 points
50 points
Asymptotic v. Actual Distribution of
Weibull Parameter Estimators
ƒ Actual distribution does not look like its normal approximation when
data sparse
ƒ Normal Distribution can produce negative values for parameters
which are positive by definition
ƒ Ignore negative values?
ƒ Convenient, but not always a good approximation!
Calculation of Parameter Distribution
ƒ The Pareto examples discussed earlier lead into nice
analytical formulae for the likelihood which had a gamma
form.
n
− a ∑ ln( x )
n
n
a e
f (a) =
i
i =1
(∑ ln( xi )) n +1
i =1
Γ( a )
ƒ However, this is not generally the case.
γ −1 − cxγ
ƒ For example Weibull f ( x) = c ⋅ γ ⋅ x e , x, c, γ > 0
has likelihood function
n
n
n
γ
γ
−c ∑ xi ( γ −1) ∑ ln( xi )
−c ∑ xi n
n n
γ
n
n
−
1
i =1
L (c, γ ) = c γ e i =1 ∏ xi = c γ e i =1 e
i =1
ƒ Which does not have a recognisable standard form
Calculation of Parameter Distribution
Monte Carlo Methods
ƒ Monte Carlo Statistical Methods such as the Gibbs
sampler can be of great help in these situations.
ƒ These methods have found applications in Bayesian
Statistics which are briefly mentioned later, but they can
also be used in the classical case.
ƒ The details of the Monte Carlo methods are not
discussed here, but can be found in Robert, C. P. &
Casella , G. (2004).
ƒ A shorter description which is also more relevant to
actuarial work can be found in Scollnik, D. P. M. (2000).
ƒ Exact method used depends on the problem, here the
generic case is shown
Calculation of Parameter Distribution
Monte Carlo Methods – Gibbs Sampling
ƒ Let say we want to simulate from the joint density f (θ , θ ,..., θ )
1
2
k
ƒ Gibbs Sampling
ƒ Initial valuesθ 1( 0 ) , θ 2( 0 ) ,..., θ k( 0 )are arbitrarily chosen
ƒ Then
s are simulated from the conditional distributions
θ1(1) ~ f (θ1 | θ 2( 0 ) ,..., θ k( 0 ) )
θ
θ 2(1) ~ f (θ 2 | θ1(1) , θ 3( 0 ) ..., θ k( 0 ) )
.
.
(1)
(1)
(1)
θ
~
f
(
θ
|
θ
,...,
θ
k
k −1 )
1
ƒ This is repeated many times k
ƒ the first few simulated values are usually ignored
ƒ If the conditional distributions can not be recognised, then a generic
sampling algorithm usually based on variations of the Metropolis
algorithm is used
Calculation of Parameter Distribution
Monte Carlo Methods – Gibbs Sampling
ƒ Example: For the Weibull distribution we have
f (c | γ ) ∝ c n e
−c
n
∑ xiγ
i =1
γ
n
∑ ln( xi )
−c
n
∑ xiγ
f (γ | c ) ∝ γ n e i =1
e i =1
f (c | γ ) has the form of a
⎛
Gamma⎜ n + 1,
⎝
⎞
xi ⎟
∑
i =1
⎠
n
γ
ƒ
ƒ but f (γ | c ) does not have a standard easily
recognisable form
ƒ Random numbers from f (γ | c) could be simulated using
some version of the Metropolis algorithm
The Metropolis algorithm
ƒ Used to simulate losses (under some conditions) from a
distribution which does not have a standard recognisable
form
ƒ There are many variations of the algorithm
ƒ The appropriate variation depends on the nature of the problem
and the form of the distribution
ƒ Generate U 1 ~ Uniform(0, A) A is sufficiently large
ƒ Generate U 2 ~ Uniform(0, 1)
ƒ If
⎛ f (U 1 ) ⎞
U 2 < min⎜⎜
,1⎟⎟
⎝ f (at −1 ) ⎠
ƒ Else
at = at −1
then a t = U 1
Temporal Parameter Uncertainty
Temporal Parameter Uncertainty
ƒ Changes may be gradual or sudden
ƒ Effect of change may be different on different risk covers, may
affect severity or frequency or both
ƒ Changes may be estimated using external information or from the
data
ƒ A good measure of exposure may provide a better indication of risk
changes compared to what can be extracted from some sparse data
ƒ some special features may not be described sufficiently by any
external information, then the loss data may be able to reveal some
trends
ƒ When data used, some non parametric methods may be useful in
extracting trends, see later
ƒ It is often the case that trends are hidden in randomness and can
not be identified or measured easily
Temporal Parameter Uncertainty – Fooled by Randomness
ƒ Assume frequency follows Poisson( λ ). Poisson parameter
changes in a compound way by k = 1 + g
ƒ Experiment: investigate the behaviour of the estimators
ƒ The log-likelihood function is
(
l (λ , k ) = −λ 1 + k + ... + k n −1
)
⎛ n ⎞
⎛ n
⎞
+ ⎜ ∑ ni ⎟ ln(λ ) + ⎜ ∑ (i − 1)ni ⎟ ln(k )
⎝ i =1 ⎠
⎝ i =1
⎠
ƒ For λ =3, g=10% and number of years n=10
Temporal Parameter Uncertainty – Fooled by Randomness
ƒ For λ =3, g=10% and number of years n=10, based on 5,000 sim
40%
35%
30%
25%
growth g
20%
15%
10%
5%
0%
0
1
2
3
4
5
6
7
8
-5%
-10%
parameter lamda
ƒ Random fluctuations in the simulated number of losses can
be interpreted as growth trends
Effect of “Unusually” Large Losses
“Unusually” Large Losses and Rules of Thumb
100%
100%
90%
90%
80%
80%
70%
70%
60%
using all data
excluding 'unusual' losses
50%
40%
30%
Cumulative Distribution
Cumulative Distribution
ƒ Large losses are treated separately. What is a large loss?
ƒ Rules of thumb, e.g. largest loss is 3 times larger than the second
largest
ƒ Assume Pareto a=1.2 and follow the above rule of thumb
60%
using all data
excluding 'unusual' losses
50%
40%
30%
20%
20%
10%
10%
0%
0
0.2
0.4
0.6
0.8
1
1.2
1.4
1.6
1.8
2
2.2
Pareto Parameter
2.4
2.6
2.8
3
3.2
3.4
3.6
3.8
4
0%
0
100
200
300
400
500
600
700
800
900
1000
1100
1200
1300
1400
1500
95th percentile
ƒ There is significant difference in the distribution of both the parameter
and the 95th percentile if we exclude an “unusually” high loss.
“Unusually” Large Losses and Rules of Thumb
ƒ If X is the kth upper order statistics, i.e. X 1, n is the maximum of a
k ,n
sample of n and X n , n is the minimum of a sample of size n, then it can
be shown that
k
⎛ 1 − F ( y) ⎞
⎟⎟ , y > x k +1
Pr[ X k ,n > y | X k +1,n = x k +1 ) = ⎜⎜
⎝ 1 − F ( x k +1 ) ⎠
⎛ ⎛ c ⎞a ⎞
ƒ Apply to Pareto
⎜⎜
⎟ ⎟
Pr[ X 1,n > λ ⋅ x 2,n | X 2,n
⎜⎜λ⋅x ⎟
2,n ⎠
= x 2,n ] = ⎜ ⎝
⎜ ⎛ c ⎞a
⎟
⎜ ⎜
⎜ ⎜ x 2,k ⎟
⎠
⎝ ⎝
a
⎟
1
⎛
⎞
⎟ = ⎜ ⎟ , λ >1
⎟ ⎝λ⎠
⎟
⎟
⎠
ƒ For a=1.2 and λ = 3 there is a probability of 26.75% that the largest
loss will be more than 3 times higher than the second largest loss.
ƒ a large loss should not be considered to be “unusual” without
careful examination.
Model Uncertainty
Model Uncertainty
ƒ In the previous slides the model (distribution) was
assumed known and the parameter was estimated
ƒ The model (distribution) could also be estimated by the
data
ƒ Likelihood function could also be used
ƒ Different distributions has different number of parameters and
several criteria have been proposed for comparing different
models
ƒ Example criterion is the Schwartz Bayesian Criterion (SBC)
which is
⎛ n ⎞
⎟
⎝ 2π ⎠
Ln(likelihood) – r ⋅ ln ⎜
Where r is the number of parameters and n is the sample size
Model Uncertainty: Experiment
ƒ Assume that the true distribution is Pareto(c=10,a=1.2)
ƒ Random samples are generated and the following
distributions are fitted: Pareto, Exponential, Weibull,
Gamma, LogNormal and LogGamma
ƒ This is repeated 10,000 times
Model Uncertainty: Experiment
ƒ The distributions of the 95th percentile are:
100%
true 95th
90%percentile
80%
Cumulative Disbution
70%
60%
only Pareto
true 95th percentile
selection of distributions
50%
40%
30%
20%
10%
0%
0
100
200
300
400
500
600
700
800
95th percentile
900
1000
1100
1200
1300
1400
1500
Model Uncertainty: Experiment
ƒ For a sample size of 10
ƒ the true distribution (Pareto) is only chosen 27.1% of the time
ƒ The LogNormal distribution is chosen more often that the true
underlying model
ƒ when a Pareto model is assumed the 95th percentile is
underestimated about 55% of the time and
ƒ when a selection of distributions is used the 95th percentile is
underestimated about 65% of the time
ƒ In social sciences the use of distributions is often justified by data
and general reasoning.
ƒ E.g. it has been argued that the distribution of the size of cities, the
size of the earthquakes or the income of people follows a Pareto type
distribution.
ƒ There have not been similar studies for the loss distributions in
insurance.
ƒ Results are subject to simulation error
Application 2: Frequency of Aviation Losses
ƒ Number of losses which include at least 1
fatality from a subset of commercial airlines
Application 2: Frequency of Aviation Losses
ƒ
a non parametric is helpful in exploring possible temporal
trends in the data.
the number of losses in year t are assumed to follow a
Poisson ( μ t ) distribution with, ln(μ t ) = s (t ) + ln( Et )where E t is the
exposure for year t and s (t ) a smoothing spline.
0
-2
-2
-1
-1
s(y1)
s(y1, df = 1)
0
1
1
2
ƒ
5
10
y1
15
5
10
y1
15
Application 2: Frequency of Aviation Losses
ƒ
ƒ
continuous reduction in the expected number of losses,
anova analysis is inconclusive as to whether the trend has
accelerated after 2001 (WTC).
6
4
2
Number of Losses
8
10
ƒ
ƒ
1995
2000
year
2005
Red: smoothing spline
Blue: straight line
Application 2: Frequency of Aviation Losses
ƒ
ƒ
ƒ
Assume ln(μ t ) = a + bt + ln( Et ) or μ = e E = λ ⋅ k ⋅ E
k can be thought as k=1+g , where g is the annual
rate of change in the Poisson parameter.
λ and g are to be estimated. The likelihood
function is
n
−λ ∑ E g
∑n n
a + bt
t
l (λ , g ) = ∏ e
ƒ
ƒ
t
− λEt g t −1
t =1
Gibbs sampler
l(g | λ)
(λE g )
t −1 nt
t
n
/ nt = e
t
n
t −1
t
t =1
t
⋅λ
t
t =1
⋅ ∏ g nt (t −1)
t =1
n
n
⎛
⎞
l (λ | g ) ∝ Gamma⎜1 + ∑ nt , ∑ E t ⎟
t =1
t =1
⎝
⎠
is not in the form of a standard distribution
and the Metropolis algorithm can be used
Application 2: Frequency of Aviation Losses
100%
90%
Cumulative Distribution
80%
70%
60%
50%
40%
30%
20%
10%
0%
0
0.5
1
1.5
2
2.5
Poisson Parameter
3
3.5
4
4.5
5
Application 2: Frequency of Aviation Losses
ƒ
ƒ
ƒ
ƒ
Growth parameter negatively correlated with initial
Poisson parameter
Estimate of growth parameter is -11.7% and its 5th
percentile is -15.8% and its 95th percentile is -7.1%
The estimate for the number of losses is 1.87 per
year with its 5th and 95th percentile 0.8 and 3.3
respectively
The choice of the parameter needs to take into
account other qualitative information related to the
airlines industry
Observations and Comments (1)
ƒ
ƒ
ƒ
ƒ
ƒ
ƒ
Whether parameter risk can be diversified or not depends
on the source of the parameter uncertainty
The method of estimation/simulation also depends on the
source of parameter uncertainty and whether it can be
diversified or not
Using the wrong method could result in underestimation
and poor understanding of the risk
The correct distribution of the parameter estimator is given
by the likelihood function, given the sample
The distribution of the estimator given its true value can be
used for investigations and for validation of methods
Allowing for parameter uncertainty does not necessarily
results in percentile estimates which are close to the true
ones
Observations and Comments (2)
ƒ
ƒ
ƒ
ƒ
The Normal approximation to the parameter
distribution is convenient but not always good
Monte Carlo methods can be used instead
Random fluctuations in the simulated number of
losses can be interpreted as growth trends
a large loss should not be considered to be
“unusual” without careful examination of the
properties of its distribution
ƒ The views expressed in this presentation are
my personal views and in particular they should
not necessarily be regarded as being those of
my employer
ƒ
ƒ
ƒ
ƒ
ƒ
ƒ
ƒ
ƒ
ƒ
ƒ
ƒ
ƒ
ƒ
ƒ
ƒ
ƒ
ƒ
ƒ
ƒ
ƒ
ƒ
ƒ
References
AMERICAN ACADEMY OF ACTUARIES – COMMITTEE ON PROPERTY AND LIABILITY FINANCIAL REPORTING
(2005) Risk Transfer in P&C Reinsurance – Report to the Casualty Actuarial Task Force of the NAIC.
BINMORE, K. (2009) Rational Decisions, Princeton University Press
BUHLMANN, H. & GISLER, A. (2005) A Course in Credibility Theory and its Applications, Springer
CAIRNS, A. J. G. (2000) A Discussion of Parameter and Model Uncertainty in Insurance, Insurance Mathematics and
Economics 27, p.313-330
CASELLA, G. & GEORGE, I. E. (1992) Explaining the Gibbs Sampler, The American Statistician, Vol. 46, No 3
EMBRECTHS, P., KLUPPELBERG, C. & MIKOSCH, T. (1997) Modelling Extremal Events, Springer
FASB (1992) FAS 113 Accounting and Reporting for Reinsurance of Short-Duration and Long Duration Contracts
GAZZANIGA, M. (2008) Human: The Science behind What Makes us Unique, Ecco
GILLIES, D. (2000) Philosophical Theories of Probability, Routledge
HALL, P & TAJVIDI, N (2000) Nonparametric Analysis of Temporal Trend when Fitting Parametric Models to Extreme –
Value Data, Statistical Science, Vol. 15, No. 2, 153 - 167
KLUGMAN, S., PANJER, H. & WILLMOT, G. (1998) Loss Models, Wiley
MAJOR, J. (1999) Taking Uncertainty into Account: Bias Issues Arising from Parameter Uncertainty in Risk Models,
Casualty Actuarial Society Forum, Summer 1999
MATA, A. (2000) Parameter Uncertainty for Extreme Value Distributions, GIRO Convention
MILLNS, R. & WESTON, R. (2005) Parameter Uncertainty and Capital Modelling, GIRO Convention
PRESS, W. H., FLANNERY, B. P., TEUKOLSKY, S. A. & VETTERLING, W. T. (1994) Numerical Recipes in Pascal,
Cambridge University Press
ROBERT, C.P. & CASELLA, G. (2004) Monte Carlo Statistical Methods, Springer
ROOTZEN, H. & TAJVIDI, N. (1995) Extreme Value Statistics and Wind Storm Losses: A Case Study, Scandinavian
Actuarial Journal
SCHMOCK, U. (1999) Estimating the Value of the WINCAT Coupons of the Winterthur Insurance Convertible Bond – A
Study of the Model Risk, ASTIN Bulletin, Vol. 29, No 1, pp 101-163
SCOLLNIK, D.P.M. (2000) Actuarial Modelling with MCMC and Bugs, North American Actuarial Journal, Vol. 5, No. 2,
pp 96 - 124
SORNETTE, D (2000) Critical Phenomena in Natural Sciences, Springer
TIERNEY, L. (1994) Markov Chain for Exploring Posterior Distributions, The Annals of Statistics, Vol. 22, No 4. pp 1701
- 1728
WACEK, M (2005) Parameter Uncertainty in Loss Ratio Distributions and its Implications, Casualty Actuarial Society
Forum, Fall 2005