Download Insurance Fraud - Opal Consulting, LLC

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Cluster analysis wikipedia , lookup

Transcript
New Applications of Statistics and Data Mining
Techniques to Classification
and Fraud Detection in Insurance.
Richard A. Derrig Ph. D.
OPAL Consulting LLC
Visiting Scholar, Wharton School
University of Pennsylvania
Bogotá, Columbia
November 3, 2005
Insurance Fraud Bureau
of Massachusetts
You can steal more
with a ball point than
by gun point.
Insurance Fraud Bureau
of Massachusetts
Insurance fraud is
known as a high
reward, low risk crime.
GENERAL INSURANCE
PROBLEMS






WHAT: Product Design
WHERE: Market Characteristics
WHO: Classification & Sale
HOW: Claims Paid
WHEN: Forecasting
WHY: Profit (Expected)
TRADITIONAL
MATHEMATICAL TECHNIQUES






Arithmetic (Spreadsheets)
Probability & Statistics (Range of Outcomes)
Curve Fitting (Interpolation & Extrapolation)
Model Building (Equations for Processes)
Valuation (Risk, Investments, Catastrophes)
Numerical Method (Analytic Solution Rare)
NON-TRADITIONAL
MATHEMATICS

Fuzzy Sets & Fuzzy Logic
Elements: “in/out/partially both”
Logic: “true/false/maybe”
Decisions: “incompatible criteria”


Artificial Intelligence: “data mining”
Neural Networks: “learning algorithms”
CLASSIFICATION



Segmentation: A major exercise for
insurance underwriting and claims
Underwriting: Find profitable risks from
among the available market
Claims: Sort claims into easy pay and
claims needing investigation
Fuzzy Logic Compared with
Probability


Probability:
Measures randomness;
Measures whether or not event occurs;
and
Randomness dissipates over time or with
further knowledge.
Fuzziness:
Measures vagueness in language;
Measures extent to which event occurs;
and
Vagueness does not dissipate with time or
further knowledge.
Fuzzy Logic
Clusters





The field of Pattern Recognition is a search for structure
in data.
Old View: Given N objects, divide them into 2 < C < N
clusters of homogeneous or similar types. Similarity can
be based upon multiple features or criteria but each
object is in one and only one cluster.
New View: Objects can be members of one or several
clusters with varying strengths of membership; i.e. fuzzy
clusters are Fuzzy Sets of clusters.
Example A: Classification of Individual Risks
Example B: Classification of Injury Claims
FUZZY SETS
TOWN RATING CLASSIFICATION





When is one Town near another for Auto
Insurance Rating?
Geographic Proximity (Traditional)
Overall Cost Index (Massachusetts)
Geographically close Towns do not have the
same Expected Losses.
Clusters by Cost Produce Border Problems:
Towns between Territories. Fuzzy Clusters
acknowledge the Borders.
Are Overall Clusters correct for each Insurance
Coverage?
Fuzzy Clustering on Five Auto Coverage
Indices is better and demonstrates a weakness
in Overall Crisp Clustering.
Fuzzy Clustering of Fraud Study Claims by
Assessment Data
Membership Value Cut at 0.2
Suspicion
Centers
6
FINAL CLUSTERS
(A, C, Is, Ij, T, LN)
5
(7,8,7,8,8,0)
4
(1,7,0,7,7,0)
3
(1,4,0,4,6,0)
2
(0,1,0,1,3,0)
Build-up
(Inj. Sus.
Level =
to or >5)
1
Build-up (Inj.
Sus. Level <5)
Valid
(0,0,0,0,0,0)
Opportunistic
Fraud
Planned
Fraud
0
0
10
20
30
40
50
60
70
80
90
100
CLAIM FEATURE VECTOR ID
= Full Member
=Partial Member Alpha = .2
110
120
130
FRAUD
 The
Major Questions
What Is Fraud?
How Much Fraud is There?
What Companies Do about Fraud?
How Can We Identify a Fraudulent Claim?
FRAUD DEFINITION
Principles




Clear and willful act
Proscribed by law
Obtaining money or value
Under false pretenses
Abuse: Fails one or more Principles
Fraud Definition
COSTS




Fraud (Criminal, Hard) Small
Mass. Auto & WC < 1%
Abuse (Not Criminal, Soft Fraud),
BIG Bucks, Depends on Line
“Abuse” is (legally) a gray area, unethical
behavior
“Abuse” Containment is a Matter for
Company/Industry/Regulator
HOW MUCH CLAIM FRAUD?
10%
Fraud
FRAUD TYPES




Insurer Fraud
Fraudulent Company
Fraudulent Management
Agent Fraud
No Policy
False Premium
Company Fraud
Embezzlement
Inside/Outside Arrangements
Claim Fraud
Claimant/Insured
Providers/Rings
CLAIM FRAUD INDICATORS
VALIDATION PROCEDURES



Canadian Coalition Against Insurance Fraud
(1997) 305 Fraud Indicators (45 vehicle theft)
“No one indicator by itself is necessarily
suspicious”.
Problem: How to validate the systematic use
of Fraud Indicators?
AIB FRAUD INDICATORS
1989 Examples

Accident Characteristics (19)
No report by police officer at scene
No witnesses to accident

Claimant Characteristics (11)
Retained an attorney very quickly
Had a history of previous claims

Insured Driver Characteristics (8)
Had a history of previous claims
Gave address as hotel or P.O. Box
AIB FRAUD INDICATORS
1989 Examples

Injury Characteristics (12)
Injury consisted of strain/sprain only
No objective evidence of injury

Treatment Characteristics (9)
Large number of visits to a chiropractor
DC provided 3 or more modalities on most
visits

Lost Wages Characteristics (6)
Claimant worked for self or family member
Employer wage differs from claimed wage
REAL PROBLEM


Classify all claims
Identify valid classes
Pay the claim
No hassle
Visa Example

Identify (possible) fraud
Investigation needed

Identify “gray” classes
Minimize with “learning” algorithms
FRAUDULENT CLAIM
IDENTIFICATION


Experience and Judgment
Artificial Intelligence Systems
Regression Models
Fuzzy Clusters
Neural Networks
Expert Systems
Genetic Algorithms
All of the Above
DM
Databases
Scoring Functions
Graded Output
Non-Suspicious Claims
Routine Claims
Suspicious Claims
Complicated Claims
POTENTIAL VALUE OF AN ARTIFICIAL
INTELLIGENCE SCORING SYSTEM





Screening to Detect Fraud Early
Auditing of Closed Claims to Measure
Fraud
Sorting to Select Efficiently among
Special Investigative Unit Referrals
Providing Evidence to Support a Denial
Protecting against Bad-Faith
Using Kohonen’s Self-Organizing Feature Map to
Uncover Automobile Bodily Injury Claims Fraud
PATRICK L. BROCKETT
Gus S. Wortham Chaired Prof. of Risk Management
University of Texas at Austin
XIAOHUA XIA
University of Texas, at Austin
RICHARD A. DERRIG
Senior Vice President
Automobile Insurers Bureau of Massachusetts
Vice President of Research
Insurance Fraud Bureau of Massachusetts
JOURNAL OF RISK AND INSURANCE, 65:2, 245-274, 1998,
NEURAL NETWORKS

Self-Organizing Feature Maps
T. Kohonen 1982-1990 (Cybernetics)
Reference vectors map to OUTPUT
format in topologically faithful way.
Example: Map onto 40x40 2-dimensional
square.
Iterative Process Adjusts All Reference
Vectors in a “Neighborhood” of the
Nearest One. Neighborhood Size
Shrinks over Iterations
Patterns
MAPPING: PATTERNS-TO-UNITS
FEATURE MAP
SUSPICION LEVELS
S16
S13
4-5
S10
3-4
S7
16
13
10
7
4
1
S4
S1
2-3
1-2
0-1
FEATURE MAP
SIMILIARITY OF A CLAIM
S16
S13
4-5
S10
3-4
S7
17
13
9
5
1
S4
S1
2-3
1-2
0-1
DATA MODELING EXAMPLE: CLUSTERING





Data on 16,000
Medicaid providers
analyzed by
unsupervised neural net
Neural network
clustered Medicaid
providers based on
100+ features
Investigators validated a
small set of known
fraudulent providers
Visualization tool
displays clustering,
showing known fraud
and abuse
Subset of 100 providers
with similar patterns
investigated: Hit rate >
70%
© 1999 Intelligent Technologies Corporation
Cube size proportional to annual Medicaid revenues
Modeling Hidden Exposures in Claim
Severity via the EM Algorithm
Grzegorz A. Rempala
Department of Mathematics
University of Louisville
and
Richard A. Derrig
OPAL Consulting LLC &
Wharton School,
University of Pennsylvania
Claim Cost Distribution: Empirical versus Fitted
Hidden Exposures - Overview

Modeling hidden risk exposures as additional
dimension(s) of the loss severity distribution

Considering the mixtures of probability distributions as
the model for losses affected by hidden exposures with
some parameters of the mixtures considered missing
(i.e., unobservable in practice)

Approach is feasible due to advancements in the
computer driven methodologies dealing with partially
hidden or incomplete data models

Empirical data imputation has become more
sophisticated and the availability of ever faster
computing power have made it increasingly possible to
solve these problems via iterative algorithms
Figure 1: Overall distribution of the 348 BI medical bill amounts from Appendix B
compared with that submitted by provider A.
Left panel: frequency histograms
(provider A’s histogram in filled bars).
Source: Modeling Hidden Exposures in Claim Severity via the EM Algorithm,
Grzegorz A. Rempala, Richard A. Derrig, pg. 9, 11/18/02
Right panel: density estimators
(provider A’s density in dashed line)
Figure 2: EM Fit
Left panel: mixture of normal distributions fitted
via the EM algorithm to BI data
Source: Modeling Hidden Exposures in Claim Severity via the EM Algorithm,
Grzegorz A. Rempala, Richard A. Derrig, pg. 13, 11/18/02
Right panel: Three normal components of the
mixture.
Figure 3: Latent risk in BI data modeled by the EM Algorithm with m = 3.
Left panel: set of responsibilities δ j 3.
Source: Modeling Hidden Exposures in Claim Severity via the EM Algorithm,
Grzegorz A. Rempala, Richard A. Derrig, pg. 14, 11/18/02
Right panel: the third component of the normal mixture
compared with the distribution of provider A’s claims
(“A” claims density estimator is a solid curve)
Fraud Classification Using Principal Component
Analysis of RIDITs
PATRICK L. BROCKETT
Gus S. Wortham Chaired Prof. of Risk Management
University of Texas at Austin
RICHARD A. DERRIG
Senior Vice President
Automobile Insurers Bureau of Massachusetts
Vice President of Research
Insurance Fraud Bureau of Massachusetts
LINDA L. GOLDEN
Marlene & Morton Meyerson Centennial Professor in Business
University of Texas
Austin, Texas
ARNOLD LEVINE
Professor Emeritus
Department of Mathematics
Tulane University
New Orleans LA
MARK ALPERT
Professor of Marketing
University of Texas
Austin, Texas
JOURNAL OF RISK AND INSURANCE, 69:3, SEPT. 2002
THE PROBLEM




Data: Features have no natural metric-scale
Model: Stochastic process has no parametric
form
Classification: Inverse image of one dimensional
scoring function and decision rule
Feature Value: Identify which features are
“important”
PRIDIT METHOD OVERVIEW

1. DATA: N Claims, T Features, K sub T Responses,
Monotone In “Fraud”

2. RIDIT score each possible response: proportion below
minus proportion above, score centered at zero.

3. RESPONSE WEIGHTS: Principal Component of Claims x
Features with RIDIT in Cells.

4. SCORE: Sum weights x claim ridit score.

5. PARTITION: above and below zero.
TABLE 1
Computation of PRIDIT Scores
Variable
Proportion of B
t1
“Yes”
Label
("Yes")
Variable
Bt2
(“No")
Large # of Visits to Chiropractor
TRT1
44%
-.56
.44
Chiropractor provided 3 or more
modalities on most visits
TRT2
12%
-.88
.12
Large # of visits to a physical therapist
TRT3
8%
-.92
.08
MRI or CT scan but no inpatient
hospital charges
TRT4
20%
-.80
.20
Use of “high volume” medical provider
TRT5
31%
-.69
.31
Significant gaps in course of treatment
TRT6
9%
-.91
.09
Treatment was unusually prolonged
(> 6 months)
TRT7
24%
-.76
.24
Indep. Medical examiner questioned
extent of treatment
TRT8
11%
-.89
.11
Medical audit raised questions about
charges
TRT9
4%
-.96
.04
TABLE 2
Weights for Treatment Variables
PRIDIT Weights
W(∞)
Regression
Weights
TRT1
.30
.32***
TRT2
.19
.19***
TRT3
.53
.22***
TRT4
TRT5
.38
.02
.07
.08*
TRT6
TRT7
TRT8
.70
.82
.37
-.01
.03
.18***
Variable
TRT9
-.13
.24**
Regression significance shown at 1% (***), 5% (**) or 10% (*)
levels.
TABLE 3
PRIDIT Transformed Indicators, Scores and Classes
Claim
TRT
1
TRT
2
TRT
3
TRT
4
TRT
5
TRT
6
TRT
7
TRT
8
TRT
9
Score
Class
1
0.44
0.12
0.08
0.2
0.31
0.09
0.24
0.11
0.04
.07
2
2
0.44
0.12
0.08
0.2
-0.69
0.09
0.24
0.11
0.04
.07
2
3
0.44
-0.88
-0.92
0.2
0.31
-0.91
-0.76
0.11
0.04
-.25
1
4
-0.56
0.12
0.08
0.2
0.31
0.09
0.24
0.11
0.04
.04
2
5
-0.56
-0.88
0.08
0.2
0.31
0.09
0.24
0.11
0.04
.02
2
6
0.44
0.12
0.08
0.2
0.31
0.09
0.24
0.11
0.04
.07
2
7
-0.56
0.12
0.08
0.2
0.31
0.09
-0.76
-0.89
0.04
-.10
1
8
-0.44
0.12
0.08
0.2
-0.69
0.09
0.24
0.11
0.04
.02
2
9
-0.56
-0.88
0.08
-0.8
0.31
0.09
0.24
0.11
-0.96
.05
2
10
-0.56
0.12
0.08
0.2
0.31
0.09
0.24
0.11
0.04
.04
2
TABLE 7
AIB Fraud and Suspicion Score Data
Top 10 Fraud Indicators by Weight
PRIDIT
ACC3
Adj. Reg. Score
ACC1
Inv. Reg. Score
ACC11
ACC4
ACC15
CLT11
ACC9
ACC10
ACC19
CLT4
CLT7
CLT11
INJ1
INJ2
CLT11
INS6
INJ1
INJ3
INJ5
INJ2
INJ8
INJ6
INS8
TRT1
INJ9
TRT1
LW6
INJ11
TRT1
TRT9
TABLE 10
AIB Fraud Indicator and
Suspicious Score Classes
Fraud/Non-fraud Classifications
PRIDIT
Adjuster
Regression
Score
Fraud
Nonfraud
All
Fraud
30
5
35
NonFraud
All
32
60
92
62
65
127
( =11.3) [4.0, 31.8]
REFERENCES
Brockett, Patrick L., Derrig, Richard A., Golden, Linda L., Levine, Albert and Alpert, Mark,
(2002), Fraud Classification Using Principal Component Analysis of RIDITs, Journal
of Risk and Insurance, 69:3, 341-373.
Brockett, Patrick L., Xiaohua, Xia and Derrig, Richard A., (1998), Using Kohonen’ SelfOrganizing Feature Map to Uncover Automobile Bodily Injury Claims Fraud,
Journal of Risk and Insurance, 65:245-274
Derrig, R.A. and H.I. Weisberg, [2004], Determinants of Total Compensation for Auto
Bodily Injury Liability Under No-Fault: Investigation, Negotiation and the Suspicion
of Fraud, Insurance and Risk Management, Volume 71, (4), pp. 633-662.
Derrig, R.A., H.I. Weisberg and Xiu Chen, [1994], Behavioral Factors and Lotteries Under
No-Fault with a Monetary Threshold: A Study of Massachusetts Automobile
Claims, Journal of Risk and Insurance, 61:2, 245-275.
Rempala, G. and R.A. Derrig, (2005), Modeling Hidden Exposures in Claim Severity via the
EM Algorithm, NAAJ, v9,n2,108-128
Viaene, Stijn, Derrig, Richard A., Dedene, Guido, (2004), A Case Study of Applying
Boosting Naïve Bayes to Claim Fraud Diagnosis, IEEE Transactions on Knowledge
and Data Engineering, v18,n5, May
Viaene, Stijn, Derrig, Richard A., Baesens, Bart, and Dedene, Guido, (2002), A
Comparison of State-of-the-Art Classification Techniques for Expert Automobile
Insurance Fraud Detection, Journal of Risk and Insurance, 69:3, 373-423.
Fuzzy References
Insurance Related Material
Brockett, P.L., Cooper, W.W., Golden, L.L. and Pitaktong, V. (1994), A neural network method for
obtaining an early warning of insurer solvency, Journal of Risk and Insurance 61, pp. 402-424.
Cummins, J.D. and Derrig, R.A. (1997), Fuzzy financial pricing of property-liability insurance, North
American Actuarial Journal, 1:4, pp. 21-44.
Cummins, J.D. and Derrig, R.A. (1993), Fuzzy trends in property-liability insurance claim costs,
Journal of Risk and Insurance, September, 60, pp. 429-465.
Derrig, R.A. and Ostaszewski, K.M. (1995), Fuzzy techniques of pattern recognition in risk and claim
classification, Journal of Risk and Insurance, 62:3 pp.447-482.
Derrig, R.A. and Ostaszewski, K.M. (1997), Managing the tax liability of a propety-liability insurance
company, Journal of Risk and Insurance, 64:4, pp.695-711.
DeWit, G.W. (1982), Underwriting and uncertainty, Insurance: Mathematics and Economics 1, pp. 277285.
Jablonowski, M. (1993), An expert system for retention selection, CPCU Journal, pp. 214-221.
Lemaire, Jean (1990), Fuzzy insurance, Astin Bulletin 20(1), pp. 33-55.
Ostaszewski, K.M. (1993), An Investigation into Possible Applications of Fuzzy Sets Methods in
Actuarial Science, Society of Actuaries, Schaumburg, Illinois.
Young, V. (1994), Application of fuzzy sets to group health underwriting, Transactions of the Society of
Actuaries 45, pp. 551-590.
Young, V. (1996), Rate changing: A fuzzy logic approach, Journal of Risk and Insurance, 63:3, pp.461484.