Download INSURANCE FRAUD The Crime and Punishment

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Nearest-neighbor chain algorithm wikipedia , lookup

Expectation–maximization algorithm wikipedia , lookup

Nonlinear dimensionality reduction wikipedia , lookup

K-means clustering wikipedia , lookup

K-nearest neighbors algorithm wikipedia , lookup

Cluster analysis wikipedia , lookup

Transcript
If They Cheat, Can We Catch
Them With Predictive Modeling
Richard A. Derrig, PhD, CFE
President Opal Consulting, LLC
Senior Consultant
Insurance Fraud Bureau of
Massachusetts
CAS Predictive Modeling
October 11-12, 2007
Insurance Fraud- The Problem
 ISO/IRC 2001 Study: Auto and
Workers Compensation Fraud a Big
Problem by 27% of Insurers.
 CAIF: Estimation (too large)
 Mass IFB: 1,500 referrals annually for
Auto, WC, and Other P-L lines.
Fraud Definition
PRINCIPLES




Clear and willful act
Proscribed by law
Obtaining money or value
Under false pretenses
Abuse, Unethical:Fails one or more Principles
HOW MUCH CLAIM FRAUD?
(CRIMINAL or CIVIL?)
10%
Fraud
REAL PROBLEM-CLAIM FRAUD
 Classify all claims
 Identify valid classes
 Pay the claim
 No hassle
 Visa Example
 Identify (possible) fraud
 Investigation needed
 Identify “gray” classes
 Minimize with “learning” algorithms
Company Automation - Data Mining
 Data Mining/Predictive Modeling
Automates Record Reviews
 No Data Mining without Good Clean Data
(90% of the solution)
 Insurance Policy and Claim Data;
Business and Demographic Data
 Data Warehouse/Data Mart
 Data Manipulation – Simple First;
Complex Algorithms When Needed
DATA
Computers advance
FRAUD IDENTIFICATION
 Experience and Judgment
 Artificial Intelligence Systems






Regression & Tree Models
Fuzzy Clusters
Neural Networks
Expert Systems
Genetic Algorithms
All of the Above
MATHEMATICAL MODELS
 Databases
 Vector Spaces
 Topological Spaces
 Stochastic Processes
 Scoring
 Mappings to R
 Linear Functionals
DM
Databases
Scoring Functions
Graded Output
Non-Suspicious Claims
Routine Claims
Suspicious Claims
Complicated Claims
DM
Databases
Scoring Functions
Graded Output
Non-Suspicious Risks
Routine Underwriting
Suspicious Risks
Non-Routine Underwriting
POTENTIAL VALUE OF A PREDICTIVE
MODELING SCORING SYSTEM
 Screening to Detect Fraud Early
 Auditing of Closed Claims to
Measure Fraud, Both Kinds
 Sorting to Select Efficiently among
Special Investigative Unit Referrals
 Providing Evidence to Support a
Denial
 Protecting against Bad-Faith
PREDICTIVE MODELING
SOME PUBLIC TECHNIQUES
 Fuzzy Logic and Controllers
 Regression Scoring Systems
 Unsupervised Techniques: Kohonen
and PRIDIT
 EM Algorithm (Medical Bills)
 Tree-based Methods
FUZZY SETS COMPARED WITH PROBABILITY
 Probability:
 Measures randomness;
 Measures whether or not event occurs;
and
 Randomness dissipates over time or with
further knowledge.
 Fuzziness:
 Measures vagueness in language;
 Measures extent to which event occurs;
and
 Vagueness does not dissipate with time or
further knowledge.
Fuzzy Clustering & Detection:
k-Means Clustering






Fuzzy Logic: True, False, Uncertain
Fuzzy Numbers: Membership Value
Fuzzy Clusters: Partial Membership
App1: Suspicion of Fraud
App2: Town (Zip) Rating Classes
REF: Derrig-Ostazewski 1995
FUZZY SETS
TOWN RATING CLASSIFICATION
 When is one Town near another for Auto Insurance
Rating?
- Geographic Proximity (Traditional)
- Overall Cost Index (Massachusetts)
- Geo-Smoothing (Experimental)
 Geographically close Towns do not have the same
Expected Losses.
 Clusters by Cost Produce Border Problems:
Nearby Towns Different Rating Territories. Fuzzy
Clusters acknowledge the Borders.
 Are all coverage clusters correct for each Insurance
Coverage?
 Fuzzy Clustering on Five Auto Coverage Indices is
better and demonstrates a weakness in Overall
Crisp Clustering.
Fuzzy Clustering of Fraud Study Claims by
Assessment Data
Membership Value Cut at 0.2
Suspicion
Centers
6
FINAL CLUSTERS
(A, C, Is, Ij, T, LN)
5
(7,8,7,8,8,0)
4
(1,7,0,7,7,0)
3
(1,4,0,4,6,0)
2
(0,1,0,1,3,0)
Build-up
(Inj. Sus.
Level =
to or >5)
1
Build-up (Inj.
Sus. Level <5)
Valid
(0,0,0,0,0,0)
Opportunistic
Fraud
Planned
Fraud
0
0
10
20
30
40
50
60
70
80
90
100
CLAIM FEATURE VECTOR ID
= Full Member
=Partial Member Alpha = .2
110
120
130
AIB FRAUD INDICATORS
 Accident Characteristics (19)
 No report by police officer at scene
 No witnesses to accident
 Claimant Characteristics (11)
 Retained an attorney very quickly
 Had a history of previous claims
 Insured Driver Characteristics(8)
 Had a history of previous claims
 Gave address as hotel or P.O. Box
Supervised Models
Regression: Fraud Indicators
 Fraud Indicators Serve as
Independent Dummy Variables
 Expert Evaluation Categories Serve
as Dependent Target
 Regression Scoring Systems
 REF1: Weisberg-Derrig, 1998
 REF2: Viaene et al., 2002
Unsupervised Models
Kohonen Self-Organizing Features
 Fraud Indicators Serve as Independent “Features”
 Expert Evaluation Categories Can Serve as Dependent
Target in Second Phase

Self-Organizing Feature Maps
 T. Kohonen 1982-1990 (Cybernetics)
 Reference vectors map to OUTPUT format
in topologically faithful way. Example:
Map onto 40x40 2-dimensional square.
 Iterative Process Adjusts All Reference
Vectors in a “Neighborhood” of the
Nearest One. Neighborhood Size Shrinks
over Iterations
Patterns
MAPPING: PATTERNS-TO-UNITS
KOHONEN FEATURE MAP
SUSPICION LEVELS
S16
S13
4-5
S10
3-4
S7
16
13
10
7
4
1
S4
S1
2-3
1-2
0-1
FEATURE MAP
SIMILIARITY OF A CLAIM
S16
S13
4-5
S10
3-4
S7
17
13
9
5
1
S4
S1
2-3
1-2
0-1
DATA MODELING EXAMPLE: CLUSTERING





Data on 16,000
Medicaid providers
analyzed by
unsupervised neural
net
Neural network
clustered Medicaid
providers based on
100+ features
Investigators
validated a small set
of known fraudulent
providers
Visualization tool
displays clustering,
showing known fraud
and abuse
Subset of 100
providers with similar
patterns
investigated: Hit rate
> 70%
© 1999 Intelligent Technologies Corporation
Cube size proportional to annual Medicaid revenues
SELF ORGANIZING MAP
Binary Features
d(m,0) = Suspicion Level
Given c = {features}
c →mc
“Guilt by Association”
PRIDIT Unique Embedded Score
 Data: Features have no natural metric-scale
 Model: Stochastic process has no parametric
form
 Classification: Inverse image of one
dimensional scoring function and decision rule
 Feature Value: Identify which features are
“important”
PRIDIT METHOD OVERVIEW
 1. DATA: N Claims, T Features, K sub T Responses, Monotone
In “Fraud”
 2. RIDIT score each possible response: proportion below
minus proportion above, score centered at zero.
 3. RESPONSE WEIGHTS: Principal Component of Claims x
Features with RIDIT in Cells (SPSS, SAS or S Plus software)
 4. SCORE: Sum weights x claim Ridit score.
 5. PARTITION: above and below zero.
PRIDIT METHOD RESULTS
 1. DATA: N Claims Clustered in Varying Degrees of “Fraud”
 2. FEATURES: Each Feature Has Importance Weight
 3. CLUSTERS: Size Can Change With Experience
 4. SCORE: Can Be Checked On New Database
 5. DECISIONS: Scores Near Zero = Too Little Information.
TABLE 2
Weights for Treatment Variables
PRIDIT Weights
W(∞)
Regression
Weights
TRT1
.30
.32***
TRT2
.19
.19***
TRT3
.53
.22***
TRT4
TRT5
.38
.02
.07
.08*
TRT6
TRT7
TRT8
.70
.82
.37
-.01
.03
.18***
Variable
TRT9
-.13
.24**
Regression significance shown at 1% (***), 5% (**) or
10% (*) levels.
TABLE 3
PRIDIT Transformed Indicators, Scores and Classes
Claim
TRT
1
TRT
2
TRT
3
TRT
4
TRT
5
TRT
6
TRT
7
TRT
8
TRT
9
Score
Class
1
0.44
0.12
0.08
0.2
0.31
0.09
0.24
0.11
0.04
.07
2
2
0.44
0.12
0.08
0.2
-0.69
0.09
0.24
0.11
0.04
.07
2
3
0.44
-0.88
-0.92
0.2
0.31
-0.91
-0.76
0.11
0.04
-.25
1
4
-0.56
0.12
0.08
0.2
0.31
0.09
0.24
0.11
0.04
.04
2
5
-0.56
-0.88
0.08
0.2
0.31
0.09
0.24
0.11
0.04
.02
2
6
0.44
0.12
0.08
0.2
0.31
0.09
0.24
0.11
0.04
.07
2
7
-0.56
0.12
0.08
0.2
0.31
0.09
-0.76
-0.89
0.04
-.10
1
8
-0.44
0.12
0.08
0.2
-0.69
0.09
0.24
0.11
0.04
.02
2
9
-0.56
-0.88
0.08
-0.8
0.31
0.09
0.24
0.11
-0.96
.05
2
10
-0.56
0.12
0.08
0.2
0.31
0.09
0.24
0.11
0.04
.04
2
TABLE 7
AIB Fraud and Suspicion Score Data
Top 10 Fraud Indicators by Weight
PRIDIT
Adj. Reg. Score
Inv. Reg. Score
ACC3
ACC1
ACC11
ACC4
ACC9
CLT4
ACC15
ACC10
CLT7
CLT11
ACC19
CLT11
INJ1
CLT11
INJ1
INJ2
INS6
INJ3
INJ5
INJ2
INJ8
INJ6
INJ9
INJ11
INS8
TRT1
TRT1
TRT1
LW6
TRT9
EM Algorithm
Hidden Exposures - Overview
 Modeling hidden risk exposures as additional
dimension(s) of the loss severity distribution via EM,
Expectation-Maximization, Algorithm
 Considering the mixtures of probability distributions as
the model for losses affected by hidden exposures with
some parameters of the mixtures considered missing
(i.e., unobservable in practice)
 Approach is feasible due to advancements in the
computer driven methodologies dealing with partially
hidden or incomplete data models
 Empirical data imputation has become more
sophisticated and the availability of ever faster
computing power have made it increasingly possible to
solve these problems via iterative algorithms
Figure 1: Overall distribution of the 348 BI medical bill amounts from
Appendix B compared with that submitted by provider A.
Left panel: frequency histograms
(provider A’s histogram in filled bars).
Source: Modeling Hidden Exposures in Claim Severity via the EM Algorithm,
Grzegorz A. Rempala, Richard A. Derrig, pg. 9, 11/18/02
Right panel: density estimators
(provider A’s density in dashed line)
Figure 2: EM Fit
Left panel: mixture of normal distributions fitted
via the EM algorithm to BI data
Source: Modeling Hidden Exposures in Claim Severity via the EM Algorithm,
Grzegorz A. Rempala, Richard A. Derrig, pg. 13, 11/18/02
Right panel: Three normal components of the
mixture.
Decision Trees
 In decision theory (for example risk
management), a decision tree is a graph of
decisions and their possible consequences,
(including resource costs and risks) used to
create a plan to reach a goal. Decision trees
are constructed in order to help with making
decisions. A decision tree is a special form of
tree structure.
 www.wikipedia.org
Different Kinds of Decision
Trees
 Single Trees (CART, CHAID)
 Ensemble Trees, a more recent development
(TREENET, RANDOM FOREST)
 A composite or weighted average of many trees
(perhaps 100 or more)
 There are many methods to fit the trees and
prevent overfitting
 Boosting: Iminer Ensemble and Treenet
 Bagging: Random Forest
The Methods and Software
Evaluated
1)
2)
3)
4)
TREENET
Iminer Tree
SPLUS Tree
CART
5)
6)
7)
8)
Iminer Ensemble
Random Forest
Naïve Bayes (Baseline)
Logistic (Baseline)
Ensemble Prediction of IME Requested
0.90
0.80
Value Prob IME
0.70
0.60
0.50
0.40
0.30
11368
2540
1805
1450
1195
989
821
683
560
450
363
275
200
100
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
Provider 2 Bill
TREENET ROC Curve – IME
AUROC = 0.701
Ranking of Methods/Software –
1st Two Surrogates
Ranking of Methods By AUROC - Decision
Method
SIU AUROC SIU Rank IME Rank IME
AUROC
Random Forest
0.645
1
1
0.703
TREENET
0.643
2
2
0.701
S-PLUS Tree
0.616
3
3
0.688
Iminer Naïve Bayes
0.615
4
5
0.676
Logistic
0.612
5
4
0.677
CART Tree
0.607
6
6
0.669
Iminer Tree
0.565
7
8
0.629
Iminer Ensemble
0.539
8
7
0.649
Implementation Outline Included at End
NON-CRIMINAL FRAUD?
Non-Criminal Fraud
 General Deterrence – Ins. System,
Ins Dept + DIA, Med, and
Other Government Oversight
 Specific Deterrence – Company SIU,
Auditor, Data, Predictive Modeling for
Claims and Underwriting.
FRAUD INDICATORS
VALIDATION PROCEDURES
 Canadian Coalition Against Insurance Fraud (1997) 305
Fraud Indicators (45 vehicle theft)
 “No one indicator by itself is necessarily suspicious”.
 Problem: How to validate the systematic use of Fraud
Indicators?
 Solution 1: Kohonen Self-Organizing Feature Map (Brockett
et.al, 1998
 Solution 2: Logistic Regression (Viaene, et.al, 2002)
 Solution 3: PRIDIT Method (Brockett et.al, 2002)
 Solution 4: Regression Tree Methods (Derrig, Francis, 2007)
REFERENCES








Brockett, Patrick L., Derrig, Richard A., Golden, Linda L., Levine, Albert and Alpert, Mark,
(2002), Fraud Classification Using Principal Component Analysis of RIDITs, Journal of Risk
and Insurance, 69:3, 341-373.
Brockett, Patrick L., Xiaohua, Xia and Derrig, Richard A., (1998), Using Kohonen’ SelfOrganizing Feature Map to Uncover Automobile Bodily Injury Claims Fraud, Journal of Risk
and Insurance, 65:245-274
Derrig, Richard A., (2002), Insurance Fraud, Journal of Risk and Insurance, 69:3, 271-289.
Derrig, Richard A., and Ostaszewski, K., (1995) Fuzzy Techniques of Pattern Recognition in
Risk & Claim Classification, 62:3, 147-182.
Francis, Louise and Derrig, Richard A., (2007) Distinguishing the Forest from the TREES:
Working Paper.
Rempala, G.A., and Derrig, Richard A., (2003), Modeling Hidden Exposures in Claim
Severity via the EM Algorithm, North American Actuarial Journal, 9(2), 108-128.
Viaene, S., Derrig, Richard A. et. al, (2002) A Comparison of State-of-the-Art Classification
Techniques for Expert Automobile Insurance Fraud Detection, Journal of Risk and
Insurance, 69:3, 373-423.
Weisberg, H.I. and Derrig R.A., (1998), Quantitative Methods for Detecting Fraudulent
Automobile Bodily Injury Claims, RISQUES, vol. 35, pp. 75-101, July-September (In
French, English available)
Claim Fraud Detection Plan
 STEP 1:SAMPLE: Systematic benchmark of
a random sample of claims.
 STEP 2:FEATURES: Isolate red flags and
other sorting characteristics
 STEP 3:FEATURE SELECTION: Separate
features into objective and subjective,
early, middle and late arriving, acquisition
cost levels, and other practical
considerations.
 STEP 4:CLUSTER: Apply unsupervised
algorithms (Kohonen, PRIDIT, Fuzzy) to
cluster claims, examine for needed
homogeneity.
Claim Fraud Detection Plan
 STEP 5:ASSESSMENT: Externally classify claims
according to objectives for sorting.
 STEP 6:MODEL: Supervised models relating selected
features to objectives (logistic regression, Naïve
Bayes, Neural Networks, CART, MARS)
 STEP7:STATIC TESTING: Model output versus
expert assessment, model output versus cluster
homogeneity (PRIDIT scores) on one or more
samples.
 STEP 8:DYNAMIC TESTING: Real time operation of
acceptable model, record outcomes, repeat steps 1-7
as needed to fine tune model and parameters. Use
PRIDIT to show gain or loss of feature power and
changing data patterns, tune investigative proportions
to optimize detection and deterrence of fraud and
abuse.