Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
watsonwyatt.com CAS 2008 Spring Meeting Joint Meeting CIA/SOA/CAS A Survey of P&C Predictive Modeling Applications Gaétan Veilleux, FCAS, MAAA June 18, 2008 What is Predictive Modeling? A statistical process which estimates the value of an observed item (dependent variable) based upon the values of other explanatory variables. 2 Copyright © Watson Wyatt Worldwide. All rights reserved P&C Predictive Modeling applications Generalized Linear Models (GLM) Data mining and other methods – Artificial neural networks – Classification and regression trees (CART) – Multivariate adaptive regression splines (MARS) – Cluster analysis – Principal components analysis / factor analysis 3 Copyright © Watson Wyatt Worldwide. All rights reserved Generalized linear models E[Y] = m = g-1(X.b + x) Var[Y] = f.V(m) / w Consider all factors simultaneously Allow for nature of random process Provides diagnostics Robust and transparent Increasingly a global standard 4 Copyright © Watson Wyatt Worldwide. All rights reserved Insurance applications of GLMs Ratemaking Underwriting Marketing Retention Expense analysis Claims management Risk management / reinsurance Sales channel Reserving 5 Copyright © Watson Wyatt Worldwide. All rights reserved Applications Ratemaking – Revise existing rating factor relativities with multivariate analysis – Introduce new rating variables or underwriting tiers – Re-define territorial boundaries – Re-define vehicle classifications – Unbundle homeowners by-peril – Understand effect of proposed rate changes at renewal (including moderator algorithms) – Define rating plan that optimizes profit while retaining required volume 6 Copyright © Watson Wyatt Worldwide. All rights reserved Ratemaking objective Age Sex Vehicle Rating Plan Premium Area Claim Limit 7 Copyright © Watson Wyatt Worldwide. All rights reserved Modeling the cost of claims Age Sex Vehicle Area Model Expected cost of claims Claim Limit 8 Copyright © Watson Wyatt Worldwide. All rights reserved Modeling the cost of claims BI Freq x Amt = Cost 1 PD Freq x Amt = Cost 2 MED Freq x Amt = Cost 3 COL Freq x Amt = Cost 4 OTC Freq x Amt = Cost 5 9 Copyright © Watson Wyatt Worldwide. All rights reserved GLM output (significant factor) 1.2 200000 180000 1 154% 138% 0.8 160000 105% 140000 84% 73% 0.6 120000 72% 58% 100000 45% 0.4 39% 80000 31% Exposure (years) Log of multiplier 93% 60000 0.2 5% 40000 0% 0 20000 -0.2 0 1 2 3 4 5 6 7 8 9 10 11 12 13 Vehicle symbol P value = 0.0% Onew ay relativities Approx 95% confidence interval Parameter estimate 10 Copyright © Watson Wyatt Worldwide. All rights reserved Age - sex interaction Example job Run 5 Model 3 - Small interaction - Third party material damage, Numbers 1 155% 138% 300000 0.8 250000 63% 63% 46% 0.4 200000 40% 28% 19% 24% 20% 150000 0.2 13% Exposure Log of multiplier 0.6 6% 0% -2% 100000 -6% 0 -11% -18% -19% -0.2 50000 -0.4 0 17-21 22-24 25-29 30-34 35-39 40-49 50-59 60-69 70+ P level = 0.0% Rank 6/6 Age of driver.Sex of driver Approx 2 SEs from estimate, Sex of driver: Female Approx 2 SEs from estimate, Sex of driver: Male Unsmoothed estimate, Sex of driver: Female Unsmoothed estimate, Sex of driver: Male Smoothed estimate, Sex of driver: Female Smoothed estimate, Sex of driver: Male 11 Copyright © Watson Wyatt Worldwide. All rights reserved Impact analysis Example job Age of driver 7000 180% 170% 160% 6000 150% 140% 5000 120% 4000 110% 100% 3000 Loss ratio Count of records 130% 90% 80% 2000 70% 60% 1000 50% 40% 0 30% 0.450 0.500 0.600 0.650 0.750 0.800 0.900 0.950 1.050 1.100 1.200 1.250 1.350 1.400 1.500 1.550 1.650 1.700 1.800 1.850 1.950 2.000 2.100 2.150 2.250 2.300 2.400 2.450 Ratio: Risk Premium / Current premium tariff 17-21 22-24 25-29 30-34 35-39 40-49 50-59 60-69 70+ Claims / Earnedprem 12 Copyright © Watson Wyatt Worldwide. All rights reserved Applications Underwriting – Provide guidelines on debits/credits – Produce scorecards to automate some elements of risk selection Marketing – Improve direct mail conversion rate for most profitable risks 13 Copyright © Watson Wyatt Worldwide. All rights reserved Scoring Distribution of score 2500 160% 140% 2000 100% 1500 80% 1000 60% Actual loss ratio Number of policies 120% 40% 500 20% 0 0% 0 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 32 34 36 38 40 42 44 46 48 50 52 54 56 58 60 62 64 66 68 70 72 74 76 78 80 82 84 86 88 90 92 94 96 98 100 Score based on expected loss ratio Number of policies Actual loss ratio 14 Copyright © Watson Wyatt Worldwide. All rights reserved Applications Retention – Understand effect of capping rate changes at renewal – Develop lifetime customer value model Expense analysis – Vary acquisition costs by other criteria Claims management – Develop fraud scorecard – Advise how TPAs affect claim costs – Analyze the drivers of claim cost and hence loss control 15 Copyright © Watson Wyatt Worldwide. All rights reserved Applications Risk management / reinsurance – Determine which risks to cede Sales channel – Align compensation with expected profitability Reserving – Provide additional method to assist reserving actuaries with ultimate projections – Identify predictors of “serious” claims 16 Copyright © Watson Wyatt Worldwide. All rights reserved P&C Predictive Modeling applications Generalized Linear Models (GLM) Data mining and other methods – Artificial neural networks – Classification and regression trees (CART) – Multivariate adaptive regression splines (MARS) – Cluster analysis – Principal components analysis / factor analysis 17 Copyright © Watson Wyatt Worldwide. All rights reserved Data Mining aka Knowledge Discovery in Databases (KDD) Broad range of methods Good at discovery, weak at estimation Many (most) are not being applied to P&C insurance ACM SIGKDD International Conference on Knowledge Discovery & Data Mining – Evolutionary spectral clustering by incorporating temporal smoothness – Making generative classifiers robust to selection bias – Nonlinear adaptive distance metric learning for clustering 18 Copyright © Watson Wyatt Worldwide. All rights reserved Data Mining – 5 Common Techniques Artificial neural networks – Non-linear predictive models that learn through training – Resemble biological neural networks in structure Decision trees – Tree-shaped structures that represent sets of decisions – These decisions generate rules for the classification of a dataset 19 Copyright © Watson Wyatt Worldwide. All rights reserved Data Mining – 5 Common Techniques (2) Genetic algorithms – Optimization techniques – Genetic combination, mutation, and natural selection Nearest neighbor – Classification of each record based on a combination of the classes of the k record(s) most similar to it in a historical dataset Rule induction – Extraction of useful if-then rules from data based on statistical significance 20 Copyright © Watson Wyatt Worldwide. All rights reserved Artificial Neural Networks ID – – – structural components for a GLM Variables Binning Interactions Input Hidden Output Fraud detection – Staged accidents – Other PM techniques 21 Copyright © Watson Wyatt Worldwide. All rights reserved Classification and Regression Trees - CART Decision tree based method Binary recursive partitioning Brute force non-parametric method Response is discontinuous Doesn’t capture strong linear relationships well N = 100,000 Applications Variable selection Binning Identify predictors of “serious” claims Area = {1, 2, 3} Area = {others} N = 41,127 N = 58,873 Density <50 Density >100 N = 11,245 N = 2,743 Density 50-100 N = 44,885 22 Copyright © Watson Wyatt Worldwide. All rights reserved Multivariate Adaptive Regression Splines MARS Multivariate non-parametric regression procedure Brute force Response is continuous Piece-wise linear segments to describe non-linear relationships Applications Variable selection Binning 23 Copyright © Watson Wyatt Worldwide. All rights reserved Cluster Analysis Seek to identify homogeneous subgroups Average linkage or centroid methods No good literature explaining which is best Minimize within-group variation and maximize between-group variation Applications Vehicle symbols Segmenting/Tiering Fraud detection 24 Copyright © Watson Wyatt Worldwide. All rights reserved Principal Components/Factor Analysis Reduce number of variables Detect structure Consecutive factors are independent of (orthogonal to) each other Applications Economic models s/a trend Transform/reduce variables 25 Copyright © Watson Wyatt Worldwide. All rights reserved ISO Innovative Analytics - Risk Analyzer Modeling Techniques Employed Variable Selection – univariate analysis, transformations, known relationship to loss Sampling Regression / general linear modeling Sub models/data reduction – neural nets, splines, principal component analysis, variable clustering Spatial Smoothing – with parameters related to auto insurance loss patterns 26 Copyright © Watson Wyatt Worldwide. All rights reserved Quotes “Prediction is very difficult, especially if it's about the future.” - Nils Bohr, Nobel laureate in Physics "I have seen the future and it is very much like the present, only longer." - Kehlog Albran, The Profit "A good forecaster is not smarter than everyone else, he merely has his ignorance better organized." - Anonymous 27 Copyright © Watson Wyatt Worldwide. All rights reserved watsonwyatt.com CAS 2008 Spring Meeting Joint Meeting CIA/SOA/CAS A Survey of P&C Predictive Modeling Applications Gaétan Veilleux, FCAS, MAAA June 18, 2008