Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Does Credit Score Really Help Explain Insurance Losses? Cheng-Sheng Peter Wu, FCAS, ASA, MAAA, Jim Guszcza, ACAS, MAAA, Ph. D. 1 Themes 2 The History What Does the Question Mean? Simpson’s Paradox - Need for Multivariate Analysis What Has Been Done So Far? Our Large-Scale Data Mining Experience Going Beyond Credit Conclusions The History Pricing/Class Plans Few factors before World War II Explosion of class plan factors after the War Current class plans (Auto) – territory, driver, vehicle, loss and violation, others, tiers/company, etc. Actuarial techniques – Minimum Bias & GLM 3 The History Credit 4 First important factor identified over the past 2 decades Composite multivariate score vs. raw credit information Introduced in late 80’s and early 90’s Viewed at first as a “secret weapon” Currently almost everyone is using it Industry scores vs. proprietary scores Quiet, confidential, controversial, black-box, …etc What Does the Question Mean? Can Credit Score Really “Explain” Ins Losses? “X explains Y” Weaker than claiming that X causes Y Stronger than merely reporting that X is correlated with Y 5 What Does the Question Mean? Working Definition We say that “X helps explain Y” if: – – 6 X is correlated with Y The correlation does not go away when other available, measurable information is introduced What Does the Question Mean? Intuition Behind the Definition It might be okay for X to be a proxy for a “true” cause of Y – – 7 Testosterone level might be a true cause of auto losses…. But it’s not available Age/Gender is a reasonable proxy It might not be okay for X to be a proxy for other available predictive information What Does the Question Mean? Applying the Definition Suppose we see that credit score plays an important role in a multivariate regression equation that predicts loss ratio Then it is fair to say the credit helps explain insurance losses A multivariate study is needed 8 Simpson’s Paradox – Need for Multivariate Analysis Statistics can lie Illustrates how a univariate association can lead to a spurious conclusion The “true” explanatory factor is masked by the spurious correlation Famous example: 1973 Berkeley admissions data 9 Simpson’s Paradox – Need for Multivariate Analysis The Berkeley Example (stylized) 2200 people applied for admission 1100 men; 1100 women 210 men, 120 women were accepted. Clear-cut case of gender discrimination… …. Or is it? 10 Simpson’s Paradox – Need for Multivariate Analysis Female Male 11 # Applicants Arts Eng Total 1000 100 1100 100 1000 1100 # Accepted Arts Eng Total 100 20 120 10 200 210 % Accepted Arts Eng Total 10% 20% 11% 10% 20% 19% Simpson’s Paradox – Need for Multivariate Analysis 12 REGRESSION RESULTS Beta Intercept 0.109 T- Score 10.2 Gender 0.082 5.1 Intercept Beta 0.10 T- Score 9.20 Gender 0.00 0.00 School 0.10 3.80 Simpson’s Paradox – Need for Multivariate Analysis Good Credit Bad Credit 13 # Policies Adult Youthful Total 1000 100 1100 100 1000 1100 # Policies w/Claims Adult Youthful Total 100 20 120 10 200 210 Frequency Adult Youthful Total 10% 20% 11% 10% 20% 19% What Has Been Done So Far We (actuaries) have been quiet Few published actuarial studies/opinions – – Recent/related studies – – – – 14 NAIC/Tillinghast (1997) Monaghan’s Study (2000) Virginia State Study (1999) CAS Sub-Committee (2002) Washington State Study (2003) University of Texas Study (2003) What Has Been Done So Far Relevant Actuarial/Statistical Principles Pure premium vs. loss ratio – Independence vs. correlation – Correlated variables call for multivariate studies for true answers (Simpson’s Paradox) Credibility vs. homogeneity – 15 Most insurance variables are correlated Univariate vs. multivariate – Loss ratio studies go beyond existing rating plans, and are implicitly multivariate Studies need to be credible and representative What Has Been Done So Far The Tillinghast Study 9 companies’ data, seems representative Loss ratio study No other predictive variables included in the study No detailed information given about the data Strong correlation with loss ratio, seems credible This is true, but it doesn’t answer our question and doesn’t quiet the critics 16 What Has Been Done So Far Tillinghast Study of 9 Companies' Data Loss Ratio Relativity of the Best and Worst 20% of Credit Score 17 Co1 Co2 Co3 Co4 Co5 Co6 Co7 Co8 Co9 Avg Best 20% -38% -29% -19% -15% -14% -34% -22% -22% -36% -25% Worst 20% 48% 20% 32% 30% 46% 59% 20% 22% 95% 41% What Has Been Done So Far Monaghan’s Study 18 Loss ratio study Large amount of data – credible analysis Analyze individual credit variables as well as score Multivariate analysis – limited to score + 1 traditional rating variable at a time Shows strong correlations with loss ratio do not go away in the presence of other variables Another good step, but we can go further Our Large-Scale Data Mining Experience Our Work Loss ratio studies Multiple studies - representative Large amounts of data – credible Hundreds of variables tested along with credit – truly multivariate – 19 Policy, driver, vehicle, coverages, billing, agency, external data, synthetic, …etc. Sound actuarial and statistical model design Disciplined data mining process Our Large-Scale Data Mining Experience What Have We Found Out? Credit score is always one of top variables selected for the multivariate models Credit score has among the strongest parameters and statistical measurements (t-score) – 20 Credit’s predictive power does not go away in the truly multivariate context Removing credit score dampens the predictive power of the models Our Large-Scale Data Mining Experience What Do We Conclude? We conclude that credit score bears an unambiguous relationship to insurance losses, and is not a mere proxy for other kinds of information available to insurance companies. This does not mean that credit score is the “cause” of insurance losses 21 Our Large-Scale Data Mining Experience Why Is Credit Score Correlated with Ins Losses? Beyond the scope of our work – Plausible speculations include – – – 22 Emphasis is not causation Stress/planning & organization Risk-seeking behavior ?? Analogy: Age/Gender might be a proxy for testosterone Going Beyond Credit Can We Do Well Without Credit? YES: non-credit predictive models are – – – – 23 Valuable alternative to credit scores Flexible Tailored to individual companies Comparable predictive power to credit scores Also possible to build mixed credit/non-credit models Going Beyond Credit Keys to Building Successful Non-Credit Models: Fully utilize all sources of information – – 24 Leverage company’s internal data sources Enriched with other external data sources Use large amount of data Employ disciplined analytical process Utilize state-of-the-art modeling tools Apply multivariate methodology Going Beyond Credit Advantages of Going Beyond Credit 25 Next generation of competitive advantage More variables, more predictive power Leverages company’s internal data sources More flexibility Address regulatory issues and public concerns Expense savings Everyone gets a score (less of a “no hit” problem) More customized – less “plain vanilla” than credit score Conclusions Credit works… even in a fully multivariate setting But non-credit models can work well too! What it means to us – beginning of a new era – – – – – 26 Advances in computer technology Advances in predictive modeling techniques Large scale multivariate studies now practical More external and internal info, anything else out there? Other ways to go beyond credit? Conclusions Future works on this topic Multivariate pure premium analysis would provide more insights Further study of public policy issues – 27 WA, VA came to opposite conclusions Comparison of various existing scoring models