Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
© Deloitte Consulting, 2005 Introduction to Bootstrapping James Guszcza, FCAS, MAAA CAS Predictive Modeling Seminar Chicago September, 2005 © Deloitte Consulting, 2005 What’s it all about? Actuaries compute points estimates of statistics all the time. Loss ratio/claim frequency for a population Outstanding Losses Correlation between variables GLM parameter estimates … A point estimate tells us what the data indicates. But how can we measure our confidence in this indication? © Deloitte Consulting, 2005 More Concisely… Point estimate says: “what do you think?” Variability of the point estimate says: “how sure are you?” Traditional approaches Credibility theory Use distributional assumptions to construct confidence intervals Is there an easier – and more flexible – way? © Deloitte Consulting, 2005 Enter the Bootstrap In the late 70’s the statistician Brad Efron made an ingenious suggestion. Most (sometimes all) of what we know about the “true” probability distribution comes from the data. So let’s treat the data as a proxy for the true distribution. We draw multiple samples from this proxy… This is called “resampling”. And compute the statistic of interest on each of the resulting pseudo-datasets. © Deloitte Consulting, 2005 Philosophy “[Bootstrapping has] requires very little in the way of modeling, assumptions, or analysis, and can be applied in an automatic way to any situation, no matter how complicated”. “An important theme is the substitution of raw computing power for theoretical analysis” --Efron and Gong 1983 Bootstrapping fits very nicely into the “data mining” paradigm. © Deloitte Consulting, 2005 The Basic Idea Theoretical Picture •Any actual sample of data was drawn from the unknown “true” distribution The “true” distribution in the sky •We use the actual data to make inferences about the true parameters (μ) μ •Each green oval is the sample that “might have been” Sample 1 Y1 1, Y1 2… Y1 Y1 k Sample 2 Y2 1, Y2 2… Y2 Y2 k Sample 3 Y3 1, Y3 2… Y3 Y3 k … Sample N YN1, YN2… YNk •The distribution of our estimator (Y) depends on both the true distribution and the size (k) of our sample YN © Deloitte Consulting, 2005 The Basic Idea The Bootstrapping Process •Treat the actual distribution as a proxy for the true distribution. The actual sample •Sample with replacement your actual distribution N times. Y Y1 , Y2 … Yk •Compute the statistic of interest on each “re-sample”. Re-sample 1 Y*1 1, Y*1 2… Y*1 Y*1 k Re-sample 2 Y*2 1, Y*2 2… Y*2 Y*2 k Re-sample 3 Y*3 1, Y*3 2… Y*3 k … Re-sample N Y*N1, Y*N2… Y*Nk Y*3 •{Y*} constitutes an estimate of the distribution of Y. Y*N © Deloitte Consulting, 2005 Sampling With Replacement In fact, there is a chance of (1-1/500)500 ≈ 1/e ≈ .368 that any one of the original data points won’t appear at all if we sample with replacement 500 times. any data point is included with Prob ≈ .632 Intuitively, we treat the original sample as the “true population in the sky”. Each resample simulates the process of taking a sample from the “true” distribution. © Deloitte Consulting, 2005 Theoretical vs. Empirical •Graph on left: Y-bar calculated from an ∞ number of samples from the “true distribution”. •Graph on right: {Y*-bar} calculated in each of 1000 resamples from the empirical distribution. •Analogy: μ : Y :: Y : Y* bootstrap distribution (Y*-bar) 0.6 0.4 0.02 0.2 0.01 0.0 0.00 phi.ybar 0.03 0.8 0.04 true distribution (Y-bar) 70 80 90 100 ybar 110 120 98.5 99.0 99.5 100.0 y.star.bar 100.5 101.0 © Deloitte Consulting, 2005 Summary The empirical distribution – your data – serves as a proxy to the “true” distribution. “Resampling” means (repeatedly) sampling with replacement. Resampling the data is analogous to the process of drawing the data from the “true distribution”. We can resample multiple times Compute the statistic of interest T on each resample We get an estimate of the distribution of T. © Deloitte Consulting, 2005 Motivating Example Let’s look at a simple case where we all know the answer in advance. Pull 500 draws from the n(5000,100) dist. The sample mean ≈ 5000 Is a point estimate of the “true” mean μ. But how sure are we of this estimate? From theory, we know that: s.d .( X ) / N 100 500 4.47 raw data statistic value #obs 500 mean 4995.79 sd 98.78 2.5%ile 4812.30 97.5%ile 5195.58 © Deloitte Consulting, 2005 Visualizing the Raw Data 500 draws from n(5000,100) Look at summary statistics, histogram, probability density estimate, QQ-plot. … looks pretty normal raw data statistic value #obs 500 mean 4995.79 sd 98.78 2.5%ile 4812.30 97.5%ile 5195.58 Normal Q-Q Plot 4700 0.000 4900 0.002 5100 0.004 n(5000,100) data 4700 4800 4900 5000 5100 5200 5300 -3 -2 -1 0 1 2 3 © Deloitte Consulting, 2005 Sampling With Replacement Now let’s use resampling to estimate the s.d. of the sample mean (≈4.47) Draw a data point at random from the data set. Draw a second data point. Then throw it back in… Keep going until we’ve got 500 data points. Then throw it back in You might call this a “pseudo” data set. This is not merely re-sorting the data. Some of the original data points will appear more than once; others won’t appear at all. © Deloitte Consulting, 2005 Resampling Sample with replacement 500 data points from the original dataset S Call this S*1 Now do this 999 more times! S*1, S*2,…, S*1000 Compute X-bar on each of these 1000 samples. S*1 S*2 S*N S*3 ... S*10 S*4 S S*5 S*9 S*6 S*8 S*7 © Deloitte Consulting, 2005 R Code norm.data <- rnorm(500, mean=5000, sd=100) boots <- function(data, R){ b.avg <<- c(); b.sd <<- c() for(b in 1:R) { ystar <- sample(data,length(data),replace=T) b.avg <<- c(b.avg,mean(ystar)) b.sd <<- c(b.sd,sd(ystar))} } boots(norm.data, 1000) © Deloitte Consulting, 2005 Results From theory we know that X-bar ~ n(5000, 4.47) Bootstrapping estimates this pretty well! And we get an estimate of the whole distribution, not just a confidence interval. raw data statistic value #obs 500 mean 4995.79 sd 98.78 2.5%ile 4705.08 97.5%ile 5259.27 Normal Q-Q Plot 4985 4995 5005 0.00 0.02 0.04 0.06 0.08 bootstrap X-bar data X-bar theory bootstrap 1,000 1,000 5000.00 4995.98 4.47 4.43 4991.23 4987.60 5008.77 5004.82 4985 4990 4995 5000 5005 5010 -3 -2 -1 0 1 2 3 © Deloitte Consulting, 2005 Two Ways of Looking at a Confidence Interval Approximate normality assumption X-bar ±2*(bootstrap dist s.d.) Percentile method Just take the desired percentiles of the bootstrap histogram. More reliable in cases of asymmetric bootstrap histograms. mean(norm.data) - 2 * sd(b.avg) [1] 4986.926 mean(norm.data) + 2 * sd(b.avg) [1] 5004.661 raw data statistic value #obs 500 mean 4995.79 sd 98.78 2.5%ile 4705.08 97.5%ile 5259.27 X-bar theory bootstrap 1,000 1,000 5000.00 4995.98 4.47 4.43 4991.23 4987.60 5008.77 5004.82 © Deloitte Consulting, 2005 And a Bonus 110 105 100 95 90 Note that we can calculate both the mean and standard deviation of each pseudo-dataset. This enables us to estimate the correlation between the mean and s.d. Normal distribution is not skew mean, s.d. are uncorrelated. Our bootstrapping experiment confirms this. sample.sd 4985 4990 4995 sample.mean 5000 5005 5010 © Deloitte Consulting, 2005 More Interesting Examples We’ve seen that bootstrapping replicates a result we know to be true from theory. Often in the real world we either don’t know the ‘true’ distributional properties of a random variable… …or are too busy to find out. This is when bootstrapping really comes in handy. © Deloitte Consulting, 2005 Severity Data 2700 size-of-loss data points. Mean = 3052, Median = 1136 0% 51.84 75% 100% 1136.10 3094.09 48346.82 Let’s estimate the distributions of the sample mean & 75th %ile. Gamma? Lognormal? Don’t need to know. severity distribution 4 e-04 482.42 50% 2 e-04 25% 0 e+00 0 10000 20000 30000 40000 50000 © Deloitte Consulting, 2005 Bootstrapping Sample Avg, 75th %ile Normal Q-Q Plot 0.000 2800 3000 0.002 3200 0.004 3400 bootstrap dist of severity sample avg 2800 3000 3200 3400 -3 -2 0 1 2 3 2 3 Normal Q-Q Plot 0.000 2800 3000 0.002 3200 3400 bootstrap dist of severity 75th % ile -1 2800 2900 3000 3100 3200 3300 3400 -3 -2 -1 0 1 © Deloitte Consulting, 2005 What about the 90th %ile? So far so good – bootstrapping shows that many of our sample statistics – even average severity! – are approximately normally distributed. But this breaks down if our statistics is not a “smooth” function of the data… Often in the loss reserving we want to focus our attention way out in the tail… 90th %ile is an example. Normal Q-Q Plot 0.0000 7000 8000 0.0010 9000 bootstrap dist of severity 90th % ile 7000 7500 8000 8500 9000 -3 -2 -1 0 1 2 3 © Deloitte Consulting, 2005 Variance Related to the Mean 6000 5500 5000 As with the normal example, we can calculate both the sample average and s.d. on each pseudo-dataset. This time (as one would expect) the variance is a function of the mean. sample.sd 2800 2900 3000 3100 sample.mean 3200 3300 3400 © Deloitte Consulting, 2005 Bootstrapping a Correlation Coefficient #1 Plot of Age vs Credit 80 60 40 20 About 700 data points Credit on a scale of 1-100 1 is worst; 100 is best Age, credit are linearly related See plot R2≈.08 ρ≈.28 Older people tend to have better credit What is the confidence interval around ρ? age 0 20 40 60 80 100 © Deloitte Consulting, 2005 Bootstrapping a Correlation Coefficient #1 ρ appears normally distributed. ρ ≈ .28 s.d.(ρ) ≈ .028 Both confidence interval calculations agree fairly well: > quantile(boot.avg,probs=c(.025,.975)) 2.5% 97.5% 0.2247719 0.3334889 > rho - 2*sd(boot.avg); rho + 2*sd(boot.avg) 0.2250254 0.3354617 Normal Q-Q Plot 0 0.20 5 0.25 0.30 10 0.35 15 correlation coefficient - bootstrap dist 0.20 0.25 0.30 0.35 -3 -2 -1 0 1 2 3 © Deloitte Consulting, 2005 Bootstrapping a Correlation Coefficient #2 Let’s try a different example. ≈1300 zip-code level data points Variables: population density, median #vehicles/HH R2≈.50 ; ρ ≈ -.70 Median #Vehicles vs Pop Density 0.0 0.5 1.0 1.5 2.0 2.5 veh 0 5000 loess line 10000 15000 regression line density 20000 25000 30000 © Deloitte Consulting, 2005 Bootstrapping a Correlation Coefficient #2 ρ more skew. ρ ≈ -.70 95% conf interval: (-.75, -.67) Not symmetric around ρ Effect becomes more pronounced the higher the value of ρ. Normal Q-Q Plot 0 -0.75 5 10 -0.70 15 -0.65 20 correlation coefficient - bootstrap dist -0.75 -0.70 -0.65 -3 -2 -1 0 1 2 3 © Deloitte Consulting, 2005 Bootstrapping Loss Ratio Now for what we’ve all been waiting for… Total loss ratio of a segment of business is our favorite point estimate. Its variability depends on many things: Size of book Loss distribution Accuracy of rating plan Consistency of underwriting… How could we hope to write down the true probability distribution? Bootstrapping to the rescue… © Deloitte Consulting, 2005 Bootstrapping Loss Ratio & Frequency ≈50,000 insurance policies Severity dist from previous example LR = .79 Claim frequency = .08 Let’s build confidence intervals around these two point estimates. We will resample the data 500 times Compute total LR and freq on each sample Plot the histogram © Deloitte Consulting, 2005 Results: Distribution of total LR A little skew, but somewhat close to normal LR ≈ .79 s.d.(LR) ≈ .05 conf interval ≈ ±0.1 Confidence interval calculations disagree a bit: > quantile(boot.avg,probs=c(.025,.975)) 2.5% 97.5% 0.6974607 0.8829664 > lr - 2*sd(boot.avg); lr + 2*sd(boot.avg) 0.6897653 0.8888983 Normal Q-Q Plot 0 0.7 2 0.8 4 0.9 6 1.0 8 bootstrap total LR 0.7 0.8 0.9 1.0 -3 -2 -1 0 1 2 3 © Deloitte Consulting, 2005 Dependence on Sample Size Let’s take a sub-sample of 10,000 policies How does this affect the variability of LR? Again re-sample 500 times Skewness, variance increase considerably LR: .79 .78 s.d.(LR): .05 .13 Normal Q-Q Plot 0.0 0.6 1.0 0.8 1.0 2.0 1.2 1.4 3.0 bootstrap total LR 0.6 0.8 1.0 1.2 1.4 -3 -2 -1 0 1 2 3 © Deloitte Consulting, 2005 Distribution of Capped LR Capped LR is analogous to trimmed mean from robust statistics Remove leverage of a few large data points Here we cap policy-level losses at $30,000 Affects 50 out of 2700 claims Closer to frequency distribution less skew – close to normal s.d. cut in half! .05 .025 Normal Q-Q Plot 0 0.55 5 0.60 10 0.65 15 0.70 bootstrap LR - losses capped @ $30K 0.55 0.60 0.65 0.70 -3 -2 -1 0 1 2 3 © Deloitte Consulting, 2005 Results: Distribution of Frequency Much less variance than LR; very close to normal freq ≈ .08 s.d.(freq) ≈ .017 Confidence interval calculations match very well: > quantile(boot.avg,probs=c(.025,.975)) 2.5% 97.5% 0.07734336 0.08391072 > lr - 2*sd(boot.avg); lr + 2*sd(boot.avg) 0.07719618 0.08388898 Normal Q-Q Plot 0 0.076 50 0.080 100 150 200 0.084 bootstrap total freq 0.074 0.076 0.078 0.080 0.082 0.084 0.086 -3 -2 -1 0 1 2 3 © Deloitte Consulting, 2005 When are LRs statistically different? Example: Divide our 50,000 policies into two sub-segments: {clean drivers, other} LRtot = .79 LRclean = .58 LRother = .84 LLRclean = -27% LRRother = +6% Clean drivers appear to have ≈ 30% lower LR than non-clean drivers How sure are we of this indication? Let’s use bootstrapping. © Deloitte Consulting, 2005 Bootstrapping the difference in LRs Simultaneously re-sample the two segments 500 times. At each iteration, calculate LRc*, LRo*, (LRc*- LRo*), (LRc* / LRo*) Analyze the resulting empirical distributions. What is the average difference in loss ratios? what percent of the time is the difference in loss ratios greater than x%? © Deloitte Consulting, 2005 LR distributions of the sub-populations Normal Q-Q Plot 0 0.4 1 0.6 2 0.8 3 4 LR: clean driving record 0.4 0.5 0.6 0.7 0.8 0.9 1.0 -3 -2 0 1 2 3 2 3 Normal Q-Q Plot 0 0.70 1 2 0.80 3 4 0.90 5 6 1.00 LR: non-clean record -1 0.70 0.75 0.80 0.85 0.90 0.95 1.00 1.05 -3 -2 -1 0 1 © Deloitte Consulting, 2005 LRR distributions of the sub-populations Normal Q-Q Plot 0.0 0.5 1.0 0.7 2.0 0.9 3.0 1.1 LRR: clean driving record 0.5 0.6 0.7 0.8 0.9 1.0 1.1 -3 -2 0 1 2 3 2 3 Normal Q-Q Plot 0 1.00 5 1.05 10 1.10 15 LRR: non-clean record -1 1.00 1.05 1.10 -3 -2 -1 0 1 © Deloitte Consulting, 2005 Distribution of LRR Differences Normal Q-Q Plot 0 -0.1 1 0.1 2 0.3 3 0.5 LRR_other - LRR_clean 0.0 0.2 0.4 0.6 -3 -2 0 1 2 3 2 3 Normal Q-Q Plot 0.0 1.0 0.5 1.5 1.0 2.0 1.5 2.5 LRR_other / LRR_clean -1 1.0 1.5 2.0 2.5 -3 -2 -1 0 1 © Deloitte Consulting, 2005 Final Example: loss reserve variability A major issue in the loss reserving community is reserve variability Bootstrapping is a natural way to tackle this problem. Predictive variance of your estimate of outstanding losses. Hard to find an analytic formula for variability of this o/s losses. Approach here: bootstrap cases, not residuals. © Deloitte Consulting, 2005 Bootstrapping Reserves S = database of 5000 claims Sample with replacement all policies in S S*3 ... S*10 S*4 S S*1, S*2,…, S*500 Estimate o/s reserves on each sample S*2 S*N Call this S*1 Same size as S Now do this 499 more times! S*1 Get a distribution of reserve estimates S*5 S*9 S*6 S*8 S*7 © Deloitte Consulting, 2005 Simulated Loss Data Simulate database of 5000 claims Each of the 5000 claims was drawn from a lognormal distribution with parameters μ=8; σ=1.3 Build in loss development patterns. 500 claims/year; 10 years Li+j = Li * (link + ε) ε is a random error term See CLRS presentation (2005) for more details. © Deloitte Consulting, 2005 Bootstrapping Reserves Compute our reserve estimate on each S*k These 500 reserve estimates constitute an estimate of the distribution of outstanding losses Notice that we did this by resampling our original dataset S of claims. Note: this bootstrapping method differs from other analyses which bootstrap the residuals of a model. These methods rely on the assumption that your model is correct. © Deloitte Consulting, 2005 Distribution of Outstanding Losses 4 e-04 total reserves - all 10 years 3 e-04 0 e+00 1 e-04 2 e-04 Blue bars: the bootstrapped distribution Dotted line: kernel density estimate of the distribution Pink line: superimposed normal 19000 20000 21000 22000 23000 24000 25000 © Deloitte Consulting, 2005 Distribution of Outstanding Losses 95% confidence interval 4 e-04 3 e-04 Mean: $21.751M Median: $21.746M σ: $0.982M σ/μ ≈ 4.5% 2 e-04 total reserves - all 10 years 1 e-04 The simulated dist of outstanding losses appears ≈ normal. 0 e+00 19000 20000 21000 22000 23000 24000 (19.8M, 23.7M) •Note: the 2.5 and 97.5 %iles of the bootstrapping distribution roughly agree with $21.75 ± 2σ 25000 © Deloitte Consulting, 2005 Distribution of Outstanding Losses We can examine a QQ plot to verify that the distribution of o/s losses is approximately normal. However, the tails are somewhat heavier than normal. Remember – this is just simulated data! Real-life results have been consistent with these results. Normal Q-Q Plot 19000 0 e+00 21000 2 e-04 23000 25000 4 e-04 total reserves - all 10 years 19000 20000 21000 22000 23000 24000 25000 -3 -2 -1 0 1 2 3 © Deloitte Consulting, 2005 References Bootstrap Methods and their Applications --Davison and Hinkley An Introduction to the Bootstrap --Efron and Tibshirani “A Leisurely Look at the Bootstrap” --Efron and Gong American Statistician 1983 “Bootstrap Methods for Standard Errors” -- Efron and Tibshirani Statistical Science 1986 “Applications of Resampling Methods in Actuarial Practice” -- Derrig, Ostaszewski, Rempala PCAS 2000