Download Bootstrapping the triangles

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Bootstrapping the triangles
Prečo, kedy a ako (NE)bootstrapovat’ v trojuholníkoch?
Michal Pešta
Charles University in Prague
Faculty of Mathematics and Physics
Actuarial Seminar, Prague
19 October 2012
Overview
- Idea and goal of bootstrapping
- Bootstrap in reserving triangles
- Residuals
- Diagnostics
- Real data example
• Support:
Czech Science Foundation project “DYME Dynamic
Models in Economics” No. P402/12/G097
Prologue for
Bootstrap
- computationally intensive method popularized in
1980s due to the introduction of computers in
statistical practice
bootstrap
does not replace or add to the original data
- a strong mathematical background
- unfortunately, the name “bootstrap” conveys the
impression of “something for nothing”
resampling from their samples
idly
Reserving Issue
- consider traditional actuarial approach to reserving
-
?
E
!
risk . . . the uncertainty in the outcomes over the
lifetime of the liabilities
bootstrap can be also applied under Solvency II
. . . outstanding liabilities after 1 year
distribution-free methods (e.g., chain ladder) only
provide a standard deviation of the
ultimates/reserves (or claims development
result/run-off result)
another risk measure (e.g., V aR @ 99.5%)
moreover, distributions of ultimate cost of claims
and the associated cash flows (not just a standard
deviation)?
claims reserving technique
applied mechanically and without judgement
Bootstrap
- simple (distribution-independent) resampling
method
- estimate properties (distribution) of an estimator
by sampling from an approximating (e.g., empirical)
distribution
- useful when the theoretical distribution of
a statistic of interest is complicated or unknown
Resampling with
replacement
- random sampling with replacement from the
original dataset
for b = 1, . . . , B resample from
∗ , . . . , X∗
X1 , . . . , Xn with replacement and obtain X1,b
n,b
Bootstrap example
- input data (# of catastrophic claims per year in 10y
history): 35, 34, 13, 33, 27, 30, 19, 31, 10, 33
mean = 26.5, sd = 9.168182
Case sampling
I
..
.
I
bootstrap sample 1 (1st draw with replacement):
30, 27, 35, 35, 13, 35, 33, 34, 35, 33
mean∗1 = 31.0, sd∗1 = 6.847546
bootstrap sample 1000 (1000th draw with
replacement): 19, 19, 31, 19, 33, 34, 31, 34, 34, 10
mean∗1000 = 26.4, sd∗1000 = 8.771165
- mean∗1 , . . . , mean∗1000 provide bootstrap empirical
distribution for mean and sd∗1 , . . . , sd∗1000 provide
bootstrap empirical distribution for sd (REALLY!?)
Terminology
- Xi,j . . . claim amounts in development year j with
accident year i
- Xi,j stands for the incremental claims in accident
year i made in accounting year i + j
- n . . . current year – corresponds to the most
recent accident year and development period
- Our data history consists of
right-angled isosceles triangles Xi,j , where
i = 1, . . . , n and j = 1, . . . , n + 1 − i
Run-off (incremental)
triangle
Accident
year i
1
2
1
X1,1
X2,1
..
.
..
.
n−1
n
Xn−1,1
Xn,1
Development year j
2
···
n−1
X1,2
···
X1,n−1
X2,2
···
X2,n−1
..
.
..
.
Xi,n+1−i
Xn−1,2
n
X1,n
Notation
- Ci,j . . . cumulative payments in origin year i after j
development periods
Ci,j =
j
X
Xi,k
k=1
- Ci,j . . . a random variable of which we have an
observation if i + j ≤ n + 1
- Aim is to estimate the ultimate claims amount Ci,n
and the outstanding claims reserve
Ri = Ci,n − Ci,n+1−i ,
i = 2, . . . , n
by completing the triangle into a square
Run-off (cumulative)
triangle
Accident
year i
1
2
1
C1,1
C2,1
..
.
..
.
n−1
n
Cn−1,1
Cn,1
Development year j
2
···
n−1
C1,2
···
C1,n−1
C2,2
···
C2,n−1
..
.
..
.
Ci,n+1−i
Cn−1,2
n
C1,n
Theory behind the
bootstrap
- validity of bootstrap procedure
- asymptotically distributionally coincide
b∗ − R
bi {Xi,j : i + j ≤ n + 1}
R
i
bi − Ri
and R
- approaching (each other) in distribution
(n)
in probability along Dn = {Xi,j : i + j ≤ n + 1}
D b
∗
b
b
→R
n→∞
Ri − Ri Dn(n) ←
i − Ri in probability,
I
∀ real-valued bounded continuous function f
h i
h i
P
bi∗ − R
bi Dn(n) − E f R
bi − Ri −−−
E f R
−→ 0
n→∞
Residuals
- measure the discrepancy of fit in a model
- can be used to explore the adequacy of fit of
a model
- may also indicate the presence of anomalous values
requiring further investigation
- in regression type problems, it is common to
bootstrap the residuals, rather than bootstrap the
data themselves
- should mimic iid r.v. having zero mean, common
variance, symmetric distribution
Raw residuals
(R) ri,j
bi,j
= Xi,j − X
or
(R) ri,j
bi,j
= Ci,j − C
- in homoscedastic linear regression, CL with α = 0,
GLM with normal distribution and identity link
Pearson residuals
(P ) ri,j
bi,j
Xi,j − X
= q
bi,j )
V (X
- in GLM or GEE
- idea is to standardize raw residuals
- ODP:
(P ) ri,j
=
bi,j
Xi,j − X
q
bi,j
X
(P ) ri,j
=
bi,j
Xi,j − X
bi,j
X
- Gamma GLM:
Anscombe residuals I
(A) ri,j
=
bi,j )
A(Xi,j ) − A(X
q
bi,j ) V (X
bi,j )
A0 ( X
- in GLM or GEE
- disadvantage of Pearson residuals: often markedly
skewed
- idea is to obtain residuals, which are not skewed
(to "normalize" residuals)
Anscombe residuals II
Z
dµ
A(·) =
V
- ODP:
(A) ri,j =
1/3 (µ)
2/3
b 2/3
3 Xi,j − X
i,j
1/6
2
b
X
i,j
- Gamma GLM:
(A) ri,j = 3
1/3
b 1/3
Xi,j − X
i,j
b 1/3
X
i,j
- inverse Gaussian:
(A) ri,j
=
bi,j
log Xi,j − log X
1/2
b
X
i,j
Deviance residuals
(D) ri,j
bi,j )
= sign(Xi,j − X
p
di ,
where di is the deviance for one unit, i.e.,
P
di = D
- in GLM or GEE
- similar advantageous properties like Anscombe
residuals (see Taylor expansion)
- ODP:
(D) ri,j
r b
bi,j ) − Xi,j + X
bi,j
= sign(Xi,j −Xi,j ) 2 Xi,j log(Xi,j /X
Scaled residuals
- scaling parameter φ
(sc)
ri,j
ri,j = q
φb
- to obtain the bootstrap prediction error, it is
necessary to add an estimate of the process
variance . . . bias correction in the bootstrap
estimation variance
- Pearson scaled residuals in ODP for the
bootstrap prediction error:
(sc) 0
(P ) ri,j
2
i,j≤n+1 (P ) ri,j
P
= (P ) ri,j ×
n(n + 1)/2 − (2n + 1)
!−1/2
Adjusted residuals
- alternative to scaled residuals . . . bias correction in
the bootstrap estimation variance
r
n−p
(adj)
ri,j = ri,j ×
n
Diagnostics for
residuals
- residuals and squared residuals
- type of plots:
I accident years
I accounting years
I development years
- no pattern visible
- iid with zero mean, symmetrically distributed,
common variance
Bootstrap procedure I
- Ex: bootstrap in ODP
(1) fit a model
estimates α
bi , βbj , γ
b, φb
(2) fitted (expected) values for triangle
bi,j = exp{b
X
γ+α
bi + βbj }
(3) scaled Pearson residuals
(sc)
(P ) ri,j
=
bi,j
Xi,j − X
q
bi,j
φbX
Bootstrap procedure II
n
o
(sc)
(4) resample residuals (P ) ri,j B-times with
replacement
Bo triangles of bootstrapped
n
(sc) ∗
residuals (P,b) ri,j , 1 ≤ b ≤ B
(5) construct B bootstrap triangles
q
(sc) ∗
∗
bb
b
(b) Xi,j = (P,b) ri,j φXi,j + Xi,j
(6) perform ODP on each bootstrap triangle
bootstrapped estimates (b) α
bi , (b) βbj , (b) γ
b, (b) φb
Bootstrap procedure III
(7) calculate bootstrap reserves
b∗
(b) Ri =
n
X
j=n+2−i
b∗
b + (b) α
bi }
(b) Xi,j = exp{(b) γ
n
X
exp{(b) βbj }
j=n+2−i
I (4)–(7) is a bootstrap loop (repeated B-times)
(8) empirical distribution of size B for the reserves
empirical (estimated) mean, s.e., quantiles, . . .
Taylor and Ashe (1983)
data
- incremental triangle
357 848
352 118
290 507
310 608
443 160
396 132
440 832
359 480
376 686
344 014
766 940
884 021
1 001 799
1 108 250
693 190
937 085
847 631
1 061 648
986 608
227 229
425 046
67 948
610 542
933 894
926 219
776 189
991 983
847 498
1 131 398
1 443 370
482 940
1 183 289
1 016 654
1 562 400
769 488
805 037
1 063 269
527 326
445 745
750 816
272 482
504 851
705 960
- R software, ChainLadder package
574 398
320 996
146 923
352 053
470 639
146 342
527 804
495 992
206 286
139 950
266 172
280 405
Development of claims
3e+06
8
7
3
4
6
2
5
1
4
7
2
3
6
5
4
3
2
6
5
4
3
2
5
1
2
3
4
1
1
1
1
1
1
8
4
9
6
3
7
2
5
1
1e+06
Claims
5e+06
2
2
3
5
7
6
9
8
1
2
0
4
3
2
4
6
Development period
8
10
CL with Mack’s s.e.
Chain ladder developments by origin period
Chain ladder dev.
2
8e+06
4
6
8
Mack's S.E.
10
2
4
6
8
10
1
2
3
4
5
6
7
8
9
10
6e+06
4e+06
Amount
2e+06
0e+00
8e+06
6e+06
4e+06
2e+06
0e+00
2
4
6
8
10
2
4
6
8
10
Development period
2
4
6
8
10
Chain ladder diagnostics
6e+06
1e+06
0e+00
4
5
6
7
8
1
9 10
4
6
8
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
6
8
2
1
3e+06
4e+06
5e+06
●
●
●
●
●
●
●
●
●
●
4
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
Origin period
●
●
●
●
●
●
●
●
4
2e+06
1
1
●
●
●
●
2
●
●
0
●
●
●
●
●
●
0
1
0
●
Standardised residuals
●
●
●
● ●
●
●
Fitted
●
−1
●
●
●●
●
●
●
●
●●
●
●
1e+06
●
●
●
●
●
●
●
●
●
●
●
10
●
●
●
●
●
●
Development period
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
2
●
●
●
●
●
5
7
6
9
8
1
2
0
4
3
2
2
1
Standardised residuals
3
1
1
1
1
−1
2
8
4
9
6
3
7
2
5
1
4
7
2
3
6
5
8
7
3
4
6
2
5
1
1
0
Amount
4e+06
2e+06
●
2
2
3
2
3
4
4
3
2
5
4
3
2
6
5
Standardised residuals
●
−1
●
●
2
●
5e+06
●
●
3e+06
●
Origin period
Standardised residuals
●
●
●
1
−1
●
●
●
●
Value
Chain ladder developments by origin period
2
Forecast
Latest
7e+06
Mack Chain Ladder Results
6
Calendar period
●
8
1
2
3
4
5
6
Development period
7
8
Bootstrap results
ecdf(Total.IBNR)
0.8
Fn(x)
0.0
2.0e+07
3.0e+07
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
2
3
4
5
6
origin period
7
8
9
10
●
●
Latest actual
●
●
1000000
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
0
●
●
●
●
●
●
●
●
●
2000000
Latest actual incremental claims
against simulated values
Mean ultimate claim
1
3.0e+07
Simulated ultimate claims cost
●
●
2.0e+07
Total IBNR
●
5.0e+06
1.0e+07
Total IBNR
latest incremental claims
1.5e+07
1.0e+07
ultimate claims costs
0.4
150
0 50
Frequency
250
Histogram of Total.IBNR
●
●
●
1
2
3
4
5
6
origin period
7
8
9
10
Mack CL vs bootstrap
GLM (ODP)
Accident
year
1
2
3
4
5
6
7
8
9
10
Total
Ultimate
3 901 463
5 433 719
5 378 826
5 297 906
4 858 200
5 111 171
5 660 771
6 784 799
5 642 266
4 969 825
53 038 946
Chain Ladder
IBNR
0
94 634
469 511
709 638
984 889
1 419 459
2 177 641
3 920 301
4 278 972
4 625 811
18 680 856
S.E.
0
75 535
121 699
133 549
261 406
411 010
558 317
875 328
971 258
1 363 155
2 447 095
Ultimate
3 901 463
5 434 680
5 396 815
5 315 089
4 875 837
5 113 745
5 686 423
6 790 462
5 675 167
5 148 456
53 338 139
Bootstrap
IBNR
0
95 595
487 500
726 821
1 002 526
1 422 033
2 203 293
3 925 964
4 311 873
4 804 442
18 980 049
S.E.
0
106 313
222 001
265 696
313 015
377 703
487 891
789 329
1 034 465
2 091 629
3 096 767
Comparison of
distributional properties
- why to bootstrap ?
- moment characteristics (mean, s.e., . . . ) does not
provide full information about the reserves’
distribution
- additional assumption required in the classical
approach
- 99.5% quantile necessary for VaR
I
I
I
assuming normally distributed reserves
. . . 24 984 154
assuming log-normally distributed reserves
. . . 25 919 050
bootstrap . . . 28 201 572
Conclusions
- mean and variance do not contain full information
about the distribution
quantities like VaR
cannot provide
- assumption of log-normally distributed claims <
log-normally distributed reserves (far more
restrictive)
- various types of residuals
- bootstrap (simulated) distribution
mimics the unknown distribution of reserves
(a mathematical proof necessary)
- R software provides a free sufficient actuarial
environment for reserving
References
England, P. D. and Verrall, R. J. (1999)
Analytic and bootstrap estimates of prediction
errors in claims reserving.
Insurance: Mathematics and Economics, 25.
England, P. D. and Verrall, R. J. (2006)
Predictive distributions of outstanding claims in
general insurance.
Annals of Actuarial Science, 1 (2).
Pesta, M. (2012)
Total least squares and bootstrapping with
application in calibration.
Statistics: A Journal of Theoretical and Applied
Statistics.
Thank you !
[email protected]