Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
New Applications of Statistics and Data Mining Techniques to Classification and Fraud Detection in Insurance. Richard A. Derrig Ph. D. OPAL Consulting LLC Visiting Scholar, Wharton School University of Pennsylvania Bogotá, Columbia November 3, 2005 Insurance Fraud Bureau of Massachusetts You can steal more with a ball point than by gun point. Insurance Fraud Bureau of Massachusetts Insurance fraud is known as a high reward, low risk crime. GENERAL INSURANCE PROBLEMS WHAT: Product Design WHERE: Market Characteristics WHO: Classification & Sale HOW: Claims Paid WHEN: Forecasting WHY: Profit (Expected) TRADITIONAL MATHEMATICAL TECHNIQUES Arithmetic (Spreadsheets) Probability & Statistics (Range of Outcomes) Curve Fitting (Interpolation & Extrapolation) Model Building (Equations for Processes) Valuation (Risk, Investments, Catastrophes) Numerical Method (Analytic Solution Rare) NON-TRADITIONAL MATHEMATICS Fuzzy Sets & Fuzzy Logic Elements: “in/out/partially both” Logic: “true/false/maybe” Decisions: “incompatible criteria” Artificial Intelligence: “data mining” Neural Networks: “learning algorithms” CLASSIFICATION Segmentation: A major exercise for insurance underwriting and claims Underwriting: Find profitable risks from among the available market Claims: Sort claims into easy pay and claims needing investigation Fuzzy Logic Compared with Probability Probability: Measures randomness; Measures whether or not event occurs; and Randomness dissipates over time or with further knowledge. Fuzziness: Measures vagueness in language; Measures extent to which event occurs; and Vagueness does not dissipate with time or further knowledge. Fuzzy Logic Clusters The field of Pattern Recognition is a search for structure in data. Old View: Given N objects, divide them into 2 < C < N clusters of homogeneous or similar types. Similarity can be based upon multiple features or criteria but each object is in one and only one cluster. New View: Objects can be members of one or several clusters with varying strengths of membership; i.e. fuzzy clusters are Fuzzy Sets of clusters. Example A: Classification of Individual Risks Example B: Classification of Injury Claims FUZZY SETS TOWN RATING CLASSIFICATION When is one Town near another for Auto Insurance Rating? Geographic Proximity (Traditional) Overall Cost Index (Massachusetts) Geographically close Towns do not have the same Expected Losses. Clusters by Cost Produce Border Problems: Towns between Territories. Fuzzy Clusters acknowledge the Borders. Are Overall Clusters correct for each Insurance Coverage? Fuzzy Clustering on Five Auto Coverage Indices is better and demonstrates a weakness in Overall Crisp Clustering. Fuzzy Clustering of Fraud Study Claims by Assessment Data Membership Value Cut at 0.2 Suspicion Centers 6 FINAL CLUSTERS (A, C, Is, Ij, T, LN) 5 (7,8,7,8,8,0) 4 (1,7,0,7,7,0) 3 (1,4,0,4,6,0) 2 (0,1,0,1,3,0) Build-up (Inj. Sus. Level = to or >5) 1 Build-up (Inj. Sus. Level <5) Valid (0,0,0,0,0,0) Opportunistic Fraud Planned Fraud 0 0 10 20 30 40 50 60 70 80 90 100 CLAIM FEATURE VECTOR ID = Full Member =Partial Member Alpha = .2 110 120 130 FRAUD The Major Questions What Is Fraud? How Much Fraud is There? What Companies Do about Fraud? How Can We Identify a Fraudulent Claim? FRAUD DEFINITION Principles Clear and willful act Proscribed by law Obtaining money or value Under false pretenses Abuse: Fails one or more Principles Fraud Definition COSTS Fraud (Criminal, Hard) Small Mass. Auto & WC < 1% Abuse (Not Criminal, Soft Fraud), BIG Bucks, Depends on Line “Abuse” is (legally) a gray area, unethical behavior “Abuse” Containment is a Matter for Company/Industry/Regulator HOW MUCH CLAIM FRAUD? 10% Fraud FRAUD TYPES Insurer Fraud Fraudulent Company Fraudulent Management Agent Fraud No Policy False Premium Company Fraud Embezzlement Inside/Outside Arrangements Claim Fraud Claimant/Insured Providers/Rings CLAIM FRAUD INDICATORS VALIDATION PROCEDURES Canadian Coalition Against Insurance Fraud (1997) 305 Fraud Indicators (45 vehicle theft) “No one indicator by itself is necessarily suspicious”. Problem: How to validate the systematic use of Fraud Indicators? AIB FRAUD INDICATORS 1989 Examples Accident Characteristics (19) No report by police officer at scene No witnesses to accident Claimant Characteristics (11) Retained an attorney very quickly Had a history of previous claims Insured Driver Characteristics (8) Had a history of previous claims Gave address as hotel or P.O. Box AIB FRAUD INDICATORS 1989 Examples Injury Characteristics (12) Injury consisted of strain/sprain only No objective evidence of injury Treatment Characteristics (9) Large number of visits to a chiropractor DC provided 3 or more modalities on most visits Lost Wages Characteristics (6) Claimant worked for self or family member Employer wage differs from claimed wage REAL PROBLEM Classify all claims Identify valid classes Pay the claim No hassle Visa Example Identify (possible) fraud Investigation needed Identify “gray” classes Minimize with “learning” algorithms FRAUDULENT CLAIM IDENTIFICATION Experience and Judgment Artificial Intelligence Systems Regression Models Fuzzy Clusters Neural Networks Expert Systems Genetic Algorithms All of the Above DM Databases Scoring Functions Graded Output Non-Suspicious Claims Routine Claims Suspicious Claims Complicated Claims POTENTIAL VALUE OF AN ARTIFICIAL INTELLIGENCE SCORING SYSTEM Screening to Detect Fraud Early Auditing of Closed Claims to Measure Fraud Sorting to Select Efficiently among Special Investigative Unit Referrals Providing Evidence to Support a Denial Protecting against Bad-Faith Using Kohonen’s Self-Organizing Feature Map to Uncover Automobile Bodily Injury Claims Fraud PATRICK L. BROCKETT Gus S. Wortham Chaired Prof. of Risk Management University of Texas at Austin XIAOHUA XIA University of Texas, at Austin RICHARD A. DERRIG Senior Vice President Automobile Insurers Bureau of Massachusetts Vice President of Research Insurance Fraud Bureau of Massachusetts JOURNAL OF RISK AND INSURANCE, 65:2, 245-274, 1998, NEURAL NETWORKS Self-Organizing Feature Maps T. Kohonen 1982-1990 (Cybernetics) Reference vectors map to OUTPUT format in topologically faithful way. Example: Map onto 40x40 2-dimensional square. Iterative Process Adjusts All Reference Vectors in a “Neighborhood” of the Nearest One. Neighborhood Size Shrinks over Iterations Patterns MAPPING: PATTERNS-TO-UNITS FEATURE MAP SUSPICION LEVELS S16 S13 4-5 S10 3-4 S7 16 13 10 7 4 1 S4 S1 2-3 1-2 0-1 FEATURE MAP SIMILIARITY OF A CLAIM S16 S13 4-5 S10 3-4 S7 17 13 9 5 1 S4 S1 2-3 1-2 0-1 DATA MODELING EXAMPLE: CLUSTERING Data on 16,000 Medicaid providers analyzed by unsupervised neural net Neural network clustered Medicaid providers based on 100+ features Investigators validated a small set of known fraudulent providers Visualization tool displays clustering, showing known fraud and abuse Subset of 100 providers with similar patterns investigated: Hit rate > 70% © 1999 Intelligent Technologies Corporation Cube size proportional to annual Medicaid revenues Modeling Hidden Exposures in Claim Severity via the EM Algorithm Grzegorz A. Rempala Department of Mathematics University of Louisville and Richard A. Derrig OPAL Consulting LLC & Wharton School, University of Pennsylvania Claim Cost Distribution: Empirical versus Fitted Hidden Exposures - Overview Modeling hidden risk exposures as additional dimension(s) of the loss severity distribution Considering the mixtures of probability distributions as the model for losses affected by hidden exposures with some parameters of the mixtures considered missing (i.e., unobservable in practice) Approach is feasible due to advancements in the computer driven methodologies dealing with partially hidden or incomplete data models Empirical data imputation has become more sophisticated and the availability of ever faster computing power have made it increasingly possible to solve these problems via iterative algorithms Figure 1: Overall distribution of the 348 BI medical bill amounts from Appendix B compared with that submitted by provider A. Left panel: frequency histograms (provider A’s histogram in filled bars). Source: Modeling Hidden Exposures in Claim Severity via the EM Algorithm, Grzegorz A. Rempala, Richard A. Derrig, pg. 9, 11/18/02 Right panel: density estimators (provider A’s density in dashed line) Figure 2: EM Fit Left panel: mixture of normal distributions fitted via the EM algorithm to BI data Source: Modeling Hidden Exposures in Claim Severity via the EM Algorithm, Grzegorz A. Rempala, Richard A. Derrig, pg. 13, 11/18/02 Right panel: Three normal components of the mixture. Figure 3: Latent risk in BI data modeled by the EM Algorithm with m = 3. Left panel: set of responsibilities δ j 3. Source: Modeling Hidden Exposures in Claim Severity via the EM Algorithm, Grzegorz A. Rempala, Richard A. Derrig, pg. 14, 11/18/02 Right panel: the third component of the normal mixture compared with the distribution of provider A’s claims (“A” claims density estimator is a solid curve) Fraud Classification Using Principal Component Analysis of RIDITs PATRICK L. BROCKETT Gus S. Wortham Chaired Prof. of Risk Management University of Texas at Austin RICHARD A. DERRIG Senior Vice President Automobile Insurers Bureau of Massachusetts Vice President of Research Insurance Fraud Bureau of Massachusetts LINDA L. GOLDEN Marlene & Morton Meyerson Centennial Professor in Business University of Texas Austin, Texas ARNOLD LEVINE Professor Emeritus Department of Mathematics Tulane University New Orleans LA MARK ALPERT Professor of Marketing University of Texas Austin, Texas JOURNAL OF RISK AND INSURANCE, 69:3, SEPT. 2002 THE PROBLEM Data: Features have no natural metric-scale Model: Stochastic process has no parametric form Classification: Inverse image of one dimensional scoring function and decision rule Feature Value: Identify which features are “important” PRIDIT METHOD OVERVIEW 1. DATA: N Claims, T Features, K sub T Responses, Monotone In “Fraud” 2. RIDIT score each possible response: proportion below minus proportion above, score centered at zero. 3. RESPONSE WEIGHTS: Principal Component of Claims x Features with RIDIT in Cells. 4. SCORE: Sum weights x claim ridit score. 5. PARTITION: above and below zero. TABLE 1 Computation of PRIDIT Scores Variable Proportion of B t1 “Yes” Label ("Yes") Variable Bt2 (“No") Large # of Visits to Chiropractor TRT1 44% -.56 .44 Chiropractor provided 3 or more modalities on most visits TRT2 12% -.88 .12 Large # of visits to a physical therapist TRT3 8% -.92 .08 MRI or CT scan but no inpatient hospital charges TRT4 20% -.80 .20 Use of “high volume” medical provider TRT5 31% -.69 .31 Significant gaps in course of treatment TRT6 9% -.91 .09 Treatment was unusually prolonged (> 6 months) TRT7 24% -.76 .24 Indep. Medical examiner questioned extent of treatment TRT8 11% -.89 .11 Medical audit raised questions about charges TRT9 4% -.96 .04 TABLE 2 Weights for Treatment Variables PRIDIT Weights W(∞) Regression Weights TRT1 .30 .32*** TRT2 .19 .19*** TRT3 .53 .22*** TRT4 TRT5 .38 .02 .07 .08* TRT6 TRT7 TRT8 .70 .82 .37 -.01 .03 .18*** Variable TRT9 -.13 .24** Regression significance shown at 1% (***), 5% (**) or 10% (*) levels. TABLE 3 PRIDIT Transformed Indicators, Scores and Classes Claim TRT 1 TRT 2 TRT 3 TRT 4 TRT 5 TRT 6 TRT 7 TRT 8 TRT 9 Score Class 1 0.44 0.12 0.08 0.2 0.31 0.09 0.24 0.11 0.04 .07 2 2 0.44 0.12 0.08 0.2 -0.69 0.09 0.24 0.11 0.04 .07 2 3 0.44 -0.88 -0.92 0.2 0.31 -0.91 -0.76 0.11 0.04 -.25 1 4 -0.56 0.12 0.08 0.2 0.31 0.09 0.24 0.11 0.04 .04 2 5 -0.56 -0.88 0.08 0.2 0.31 0.09 0.24 0.11 0.04 .02 2 6 0.44 0.12 0.08 0.2 0.31 0.09 0.24 0.11 0.04 .07 2 7 -0.56 0.12 0.08 0.2 0.31 0.09 -0.76 -0.89 0.04 -.10 1 8 -0.44 0.12 0.08 0.2 -0.69 0.09 0.24 0.11 0.04 .02 2 9 -0.56 -0.88 0.08 -0.8 0.31 0.09 0.24 0.11 -0.96 .05 2 10 -0.56 0.12 0.08 0.2 0.31 0.09 0.24 0.11 0.04 .04 2 TABLE 7 AIB Fraud and Suspicion Score Data Top 10 Fraud Indicators by Weight PRIDIT ACC3 Adj. Reg. Score ACC1 Inv. Reg. Score ACC11 ACC4 ACC15 CLT11 ACC9 ACC10 ACC19 CLT4 CLT7 CLT11 INJ1 INJ2 CLT11 INS6 INJ1 INJ3 INJ5 INJ2 INJ8 INJ6 INS8 TRT1 INJ9 TRT1 LW6 INJ11 TRT1 TRT9 TABLE 10 AIB Fraud Indicator and Suspicious Score Classes Fraud/Non-fraud Classifications PRIDIT Adjuster Regression Score Fraud Nonfraud All Fraud 30 5 35 NonFraud All 32 60 92 62 65 127 ( =11.3) [4.0, 31.8] REFERENCES Brockett, Patrick L., Derrig, Richard A., Golden, Linda L., Levine, Albert and Alpert, Mark, (2002), Fraud Classification Using Principal Component Analysis of RIDITs, Journal of Risk and Insurance, 69:3, 341-373. Brockett, Patrick L., Xiaohua, Xia and Derrig, Richard A., (1998), Using Kohonen’ SelfOrganizing Feature Map to Uncover Automobile Bodily Injury Claims Fraud, Journal of Risk and Insurance, 65:245-274 Derrig, R.A. and H.I. Weisberg, [2004], Determinants of Total Compensation for Auto Bodily Injury Liability Under No-Fault: Investigation, Negotiation and the Suspicion of Fraud, Insurance and Risk Management, Volume 71, (4), pp. 633-662. Derrig, R.A., H.I. Weisberg and Xiu Chen, [1994], Behavioral Factors and Lotteries Under No-Fault with a Monetary Threshold: A Study of Massachusetts Automobile Claims, Journal of Risk and Insurance, 61:2, 245-275. Rempala, G. and R.A. Derrig, (2005), Modeling Hidden Exposures in Claim Severity via the EM Algorithm, NAAJ, v9,n2,108-128 Viaene, Stijn, Derrig, Richard A., Dedene, Guido, (2004), A Case Study of Applying Boosting Naïve Bayes to Claim Fraud Diagnosis, IEEE Transactions on Knowledge and Data Engineering, v18,n5, May Viaene, Stijn, Derrig, Richard A., Baesens, Bart, and Dedene, Guido, (2002), A Comparison of State-of-the-Art Classification Techniques for Expert Automobile Insurance Fraud Detection, Journal of Risk and Insurance, 69:3, 373-423. Fuzzy References Insurance Related Material Brockett, P.L., Cooper, W.W., Golden, L.L. and Pitaktong, V. (1994), A neural network method for obtaining an early warning of insurer solvency, Journal of Risk and Insurance 61, pp. 402-424. Cummins, J.D. and Derrig, R.A. (1997), Fuzzy financial pricing of property-liability insurance, North American Actuarial Journal, 1:4, pp. 21-44. Cummins, J.D. and Derrig, R.A. (1993), Fuzzy trends in property-liability insurance claim costs, Journal of Risk and Insurance, September, 60, pp. 429-465. Derrig, R.A. and Ostaszewski, K.M. (1995), Fuzzy techniques of pattern recognition in risk and claim classification, Journal of Risk and Insurance, 62:3 pp.447-482. Derrig, R.A. and Ostaszewski, K.M. (1997), Managing the tax liability of a propety-liability insurance company, Journal of Risk and Insurance, 64:4, pp.695-711. DeWit, G.W. (1982), Underwriting and uncertainty, Insurance: Mathematics and Economics 1, pp. 277285. Jablonowski, M. (1993), An expert system for retention selection, CPCU Journal, pp. 214-221. Lemaire, Jean (1990), Fuzzy insurance, Astin Bulletin 20(1), pp. 33-55. Ostaszewski, K.M. (1993), An Investigation into Possible Applications of Fuzzy Sets Methods in Actuarial Science, Society of Actuaries, Schaumburg, Illinois. Young, V. (1994), Application of fuzzy sets to group health underwriting, Transactions of the Society of Actuaries 45, pp. 551-590. Young, V. (1996), Rate changing: A fuzzy logic approach, Journal of Risk and Insurance, 63:3, pp.461484.