Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Applications of Unsupervised Learning in Property and Casualty Insurance with emphasis on fraud analysis Louise Francis, FCAS, MAAA Francis Analytics and Actuarial Data Mining, Inc. www.data-mines.com [email protected] Objectives ̈ ̈ Review classic unsupervised learning techniques Introduce 2 new unsupervised learning techniques RandomForest Æ PRIDIT Æ ̈ Apply the techniques to insurance data Automobile Fraud data set Æ A publically available automobile insurance database Æ Motivation for Topic ̈ New book: Predictive Modeling in Actuarial Science Æ ̈ ̈ ̈ ̈ An introduction to predictive modeling for actuaries and other insurance professionals Publisher: Cambridge University Press Hope to Publish: Fall 2012 Chapter on Unsupervised Learning Li Yang and Louise Francis Æ Æ Li Yang Î Variable grouping (PCA) Louise Francis- record grouping (clustering) Book Project ̈ Predictive Modeling 2 Volume Book Project ̈ ̈ ̈ ̈ ̈ A joint project leading to a two volume pair of books on Predictive Modeling in Actuarial Science. Volume 1 would be on Theory and Methods and Volume 2 would be on Property and Casualty Applications. The first volume will be introductory with basic concepts and a wide range of techniques designed to acquaint actuaries with this sector of problem solving techniques. The second volume would be a collection of applications to P&C problems, written by authors who are well aware of the advantages and disadvantages of the first volume techniques but who can explore relevant applications in detail with positive results. The Fraud Study Data ̋ ̋ 1993 AIB closed PIP claims Dependent Variables ̋ ̋ ̋ Predictor Variables ̋ ̋ 7/5/2012 Suspicion Score Expert assessment of liklihood of fraud or abuse Red flag indicators Claim file variables Francis Analytics and Actuarial Data Mining, Inc. 5 The Fraud Problem from: www.agentinsure.com 7/5/2012 Francis Analytics and Actuarial Data Mining, Inc. 6 The Fraud Problem (2) from Coalition Against Insurance Fraud 7/5/2012 Francis Analytics and Actuarial Da Mining, Inc. 7 Fraud and Abuse ̈ Planned fraud Æ ̈ Staged accidents Abuse Opportunistic Æ Exaggerate claim Æ 7/5/2012 Francis Analytics and Actuarial Data Mining, Inc. 8 The Fraud Red Flags ̈ ̈ ̈ ̈ ̈ ̈ ̈ Binary variables that capture characteristics of claims associated with fraud and abuse Accident variables (acc01 - acc19) Injury variables (inj01 Î inj12) Claimant variables (ch01 Î ch11) Insured variables (ins01 Î ins06) Treatment variables (trt01 Î trt09) Lost wages variables (lw01 Î lw07) The Red Flag Variables Red Flag Variables Subject Accident Claimant Injury Indicator Variable ACCO1 A0004 A0009 ACC10 ACC11 ACC14 ACC15 ACC16 ACC19 CLT02 Description No report by police officer at scene Single vehicle accident No plausible explanation for accident Claimant in old, low valued vehicle Rental vehicle involved in accident Property Damage was inconsistent with accident Very minor impact collision Claimant vehicle stopped short Insured felt set up, denied fault Had a history of previous claims CLT04 Was an out of state accident CLT07 Was one of three or more claimants in vehicle INJO1 INJ02 Injury consisted of strain or sprain only INJO3 INJ05 INJO6 INJ11 Insured INSO1 INSO3 INSO6 INSO7 Lost Wages LWO1 LW03 7/5/2012 No objective evidence of injury Police report showed no injury or pain No emergency treatment was given Non-emergency treatment was delayed Unusual injury for auto accident Had history of previous claims Readily accepted fault for accident Was difficult to contact/uncooperative Accident occurred soon after effective date Claimant worked for self or a family member Claimant recently started employment Francis Analytics and Actuarial Data Mining, Inc. 10 Dependent Variable Problem ̈ ̈ ̈ 7/5/2012 Insurance companies frequently do not collect information as to whether a claim is suspected of fraud or abuse Even when claims are referred for special investigation Solution: unsupervised learning Francis Analytics and Actuarial Data Mining, Inc. 11 Dimension Reduction 7/5/2012 Francis Analytics and Actuarial Data Mining, Inc. 13 The CAARP Data ̈ ̈ ̈ ̈ 7/5/2012 This assigned risk automobile data was made available to researchers in 2005 for the purpose of studying the effect of change in regultion on territorial variables contain exposure information (car counts, premium) and claim and loss information (Bodily Injury (BI) counts, BI ultimate losses, Property Damage (PD) claim counts, PD ultimate losses). Each record is a zip code Good example of using unsupervised learning for territory construction Francis Analytics and Actuarial Data Mining, Inc. 14 R Cluster Library ̈ ̈ Vjg"ÐenwuvgtÑ"nkdtct{"htqo"T"used Many of the functions in the library are described in the Kaufman and TqwuuggwyÓu (1990) classic bookon clustering. Æ 7/5/2012 Finding Groups in Data. Francis Analytics and Actuarial Data Mining, Inc. 15 Grouping Records 7/5/2012 Francis Analytics and Actuarial Data Mining, Inc. 16 RF Similarity ̈ ̈ Varies between 0 and 1 Proximity matrix is an output of RF After a tree is fit, all records run through model Æ If 2 records in same terminal node, their proximity increased by 1 Æ 1-proximity forms distance Æ ̈ ̈ 7/5/2012 Can be used as an input to clustering and other unsupervised learning procedures Ugg"ÐWpuwrgtxkugf"Ngctpkpi"ykvj"Tcpfqo" Hqtguv"RtgfkevqtuÑ"d{"Ujk"cpf"Jqtxcvj Francis Analytics and Actuarial Data Mining, Inc. 18 Clustering ̈ ̈ ̈ 7/5/2012 Hierarchical clustering K-Means clustering This analysis uses k-means Francis Analytics and Actuarial Data Mining, Inc. 19 K-means Clustering ̈ ̈ ̈ 7/5/2012 An iterative procedure is used to assign each record in the data to one of the k clusters. The iteration begins with the initial centers or mediods for k groups. uses a dissimilarity measure to assign records to a group and to iterate to a final grouping. An iterative procedure is used to assign each record to one of the k clusters. byFrancis the Analytics user,and Actuarial 21 Data Mining, Inc. R Cluster Output 7/5/2012 Francis Analytics and Actuarial Data Mining, Inc. 22 Cluster Plot 7/5/2012 Francis Analytics and Actuarial Data Mining, Inc. 23 Silhouette Plot 7/5/2012 Francis Analytics and Actuarial Data Mining, Inc. 24 Silhouette Plot RF Proximity Silhouette Plot Ï Euclidean Distance Clustering Testing using Expert Scores: Fit a Tree to Suspicion Score for Importance Ranking 7/5/2012 Francis Analytics and Actuarial Data Mining, Inc. 27 Importance Ranking of the Clusters 7/5/2012 Francis Analytics and Actuarial Data Mining, Inc. 28 Fit Tree to Binary Fraud Indicator 7/5/2012 Francis Analytics and Actuarial Data Mining, Inc. 29 Importance Ranking (2) 7/5/2012 Francis Analytics and Actuarial Data Mining, Inc. 30 RF Ranking of the ÑRtgfkevqtuÒ<"Vqr"32"qh"66 # $ ) * + ' ( ( - ) * + # ( ) * - ) ( # 0 0 + 0 * 0 + + 0 1 * * ) - 1 # 7/5/2012 % # # . % # % # ! # ! # ' ' # ! ! " # ! ! . ! % ! / # ( # . # # / . % " ! , ! , ! ! # # 1 ' ! & , ! / ! ! ' ' # . ! ! ! , & ! # ! ' # 2 ! . ! ' ! " 3 Francis Analytics and Actuarial Data Mining, Inc. 31 PRIDITS of Accident Flags 7/5/2012 Francis Analytics and Actuarial Data Mining, Inc. 36 Fit Tree with PRIDITS for Each Type of Flag 7/5/2012 Francis Analytics and Actuarial Data Mining, Inc. 38 Importance Ranking of Pridits 7/5/2012 Francis Analytics and Actuarial Data Mining, Inc. 39 Importance Ranking of Factors 7/5/2012 Francis Analytics and Actuarial Data Mining, Inc. 40 Add RF and Euclid Clusters to PRIDIT Factors 7/5/2012 Francis Analytics and Actuarial Data Mining, Inc. 41 Use Salford RF MDS ̈ ̈ ̈ ̈ ̈ 7/5/2012 Top variable in importance (acc10) used as binary dependent Run tree with 1,000 forests Output proximities and MDS Use MDS scales as to cluster (k=3) Run Tree to get Importance ranking Francis Analytics and Actuarial Data Mining, Inc. 42 MDS Graph 7/5/2012 Francis Analytics and Actuarial Data Mining, Inc. 43 Rank of cluster procedures to Tree Prediction 7/5/2012 Francis Analytics and Actuarial Data Mining, Inc. 44 Labeling Clusters 7/5/2012 Francis Analytics and Actuarial Data Mining, Inc. 45 Relation Between PRIDIT Factor and Suspicion 7/5/2012 Francis Analytics and Actuarial Data Mining, Inc. 46 Next Steps ̈ Add claim file variables Rerun clusters Æ Rerun PRIDITS Æ ̈ ̈ 7/5/2012 Do Random Forest proximities on the RIDITS Apply the procedures to other fraud databases Francis Analytics and Actuarial Data Mining, Inc. 47 PRIDIT REFERENCES Ai, J., Brockett, Patrick L., cpf"Iqnfgp."Nkpfc"N0"*422;+"ÐCuuguukpi"Eqpuwogt" Htcwf"Tkum"kp"Kpuwtcpeg"Enckou"ykvj"Fkuetgvg"cpf"Eqpvkpwqwu"Fcvc.Ñ" North American Actuarial Journal 13: 438-458. Brockett, Patrick L., Derrig, Richard A., Golden, Linda L., Levine, Albert and Alpert, Mark, (2002), Fraud Classification Using Principal Component Analysis of RIDITs, Journal of Risk and Insurance, 69:3, 341-373. Brockett, Patrick L., Xiaohua, Xia and Derrig, Richard A., (1998), Using MqjqpgpÓ"Ugnh-Organizing Feature Map to Uncover Automobile Bodily Injury Claims Fraud, Journal of Risk and Insurance, 65:245-274 Bross, Irwin D.J., (1958), How To Use RIDIT Analysis, Biometrics, 4:18-38. Chipman, H.E.I. George and R.E. McCulloch, 2006, Baysian Ensemble Learning, Neural Information Processing Systems Lieberthal, Robert D., (2008), Hospital Quality: A PRIDIT Approach, Health Services Research, 43:3, 988Î1005. Questions? 7/5/2012 Francis Analytics and Actuarial Data Mining, Inc. 49