Download An Overview of Risk Management Developments that Make a

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Nonlinear dimensionality reduction wikipedia , lookup

Transcript
Data Warehousing/data Mining:
A Crop Insurance Application
Presented By
Ashley Lovell
Director of Agricultural Programs &
Professor of Agricultural Economics
At the National Risk Management
Conference
Click Icon for Program=>
D-FW Airport
March 26, 2003
1
Data Warehousing/data Mining:
A Crop Insurance Application
Presented By
Ashley Lovell
Director of Agricultural Programs &
Professor of Agricultural Economics
CAE Staff
Center for Agribusiness Excellence
Tarleton State University
The Texas A&M University System
[email protected]
2
Overview
•ARPA 2000
•Data Warehouse
•Data Mining Research
•CAE Participants
•Overview of Activities & Research
•Cost-Benefit Analysis
•Conclusion
3
Agricultural Risk Protection Act of
2000
•
•
•
•
Crop Insurance Coverage
Program Integrity
Research and Pilot Programs
Education and Risk Management
Assistance
4
5
6
RMA Customer Distribution
2001
Agricultural Risk Protection Act of
2000
• Improving Program Integrity – By Reducing
Fraud, Waste and Abuse to Build a Stronger
Crop Insurance Program and Lower Producer
Costs
– RMA and FSA to reconcile producer information*
– RMA to establish methods to identify agents and
adjusters who may be abusing the program
– Center for Agribusiness Excellence (CAE) established
*Spot Check Lists
7
Agricultural Risk Protection Act of
2000
• Center for Agribusiness Excellence (CAE)
– Mission: To conduct research using a single
data warehouse and associated data mining
tools for enhancing the integrity of the Federal
Crop Insurance Program, thus improving the
program integrity
– Began operating in January 2001
8
Data Warehouse
Description & Contents
9
Data Warehouse
• Massively Large Relational Database (Multi
Gigabyte - Terabytes)
• Generally Many Variables (Columns)
• Usually > 1 Million Observations (Rows)
• Multiple Tables (E.G., Data Tables)
• Consistent Representation (Dates, Units, Etc.)
10
Completed Tasks
CAE Data Warehouse
Contents
• > 800 Million Records
• Includes:
– RMA Insurance Data 1991-2003
– NOAA Weather Data
11
Tasks in Progress
CAE Data Warehouse
Contents
• GIS Linkage of Weather Station Data
• Integration of Soil Data
12
CAE Data Warehouse
(Other Data Bases to Be Loaded)
• Remote Sensing Data
-Collaboratively with Spatial Sciences Lab
(SSL), Texas A&M University
• Climatological Data
-Collaboratively with University of NebraskaLincoln, USDA National Drought Mitigation
Center – (NDMC/UN-L)
13
CAE Data Warehouse
(Other Data Bases to Be Loaded)
• Economic (e.g., Cash and Futures Market
Data)
• Soil Series Data
-Collaboratively with USDA NRCS National
Cartography Laboratory, SSL/TAMU &
NDMC/UN-L
14
Data Mining
Research
15
Overview of Data Mining
Graphical
Discovery
Conditional
Logic
Affinities and
Associations
Data Mining
Trends and
Variations
Predictive
Modeling
Outcome
Prediction
Forecasting
Forensic Analysis
Deviation
Detection
Link Analysis
16
Modeling Methodology
•
•
•
•
•
•
•
Linear Regression
Logistic Regression
Neural Networks
Cluster Analysis
Classification Trees
Link Analysis
Genetic Algorithms
17
Center for Agribusiness
Excellence
18
CAE’s Partners
• Tarleton State University and Planning
Systems Inc. (PSI) Are Partners in the Data
Warehouse and Data Mining
• USDA Risk Management Agency Research
Project
– Cooperative Agreement Signed on
December 14, 2000
– Competitive Contract Awarded July 24,
2002, Effective September 1, 2002
19
CAE’s Partner Contributions
• PSI has Expertise in Data Warehouse
Development and Implementation
• RMA provides the data base and program
operational experience
• Tarleton has Expertise in Agriculture and
CIS and is the Project Contractor &
Coordinator
20
Overview of CAE Activities
January 2001-March 2003
21
CAE Activities
•
•
•
•
•
•
•
•
•
•
•
•
•
University Personnel Assigned
Data Model Finished
RY 2000 Data “Readied”
Producer Watch List
Data Warehouse Loaded 1991-2000
ARPA 150% Delivered
Updated 1998-2000 Data, 2001 Data
Growing Season Spotcheck Lists
NASS Data Integrated
Web Interface Operational
49 Completed Projects
RMA Spot Check List 2002 Delivered
Last Delivery & Loading of Data
Jan
Jun
Jun
Jun
Sep
Oct
Nov
Mar
May
Aug
Sept
Feb
Mar
2001
2001
2001
2001
2001
2001
2001 – Dec 2001
2002
2002
2002
2002 – Jan 2003
2003
2003
22
CAE Research Drivers
• Legislation
• Work Orders
• Scenarios
23
CAE Research Drivers
Legislation, specifically
• ARPA of 2000 “…The Secretary shall
establish procedures under which the
Corporation will be able to identify the
following: …
24
CAE Research Drivers
• Any person performing loss adjustment services
relative to coverage offered under this title where
such loss adjustments performed by the person
result in accepted or denied claims equal to or
greater than 150 percent … of the mean for
accepted or denied claims (as applicable) for all
other persons performing loss adjustment services
in the same area, as determined by the
Corporation….”
• In addition to crop adjusters, ARPA included
crop insurance agents.
25
CAE Research Drivers
• Work Orders - RMA Personnel Routinely
Submit Requests (That Result in Work Orders)
Which Focus the Research Resources of CAE
• Scenarios*
– Over sixty scenarios/sub-scenarios
– Initiated scenario development early in 2001
*Indicators of Fraud, Waste, and Abuse
26
Spot Check List: 2002 Data for
ARPA Requirement
Scenarios for Spot Check:
•Triplets
•Frequent Filers
•Yield Switching
•Prevented Planting Frequent Filers
•Producers Associated With All or Nothing Agents
•Crop Units With Excessive Yields
•Under Reported Harvested Production
•Rare Big Losers
27
Rare Big Losers
• Identify Rare Multi-year Losers, Using the Probability
of Loss
• Local Yield Variability Considered
• Cluster and Factor Analysis Show the Importance of
Local Conditions
• A Producer’s Loss Ratios Strongly Related to Insurance
Plan and Coverage Level
28
Iowa & Oklahoma Are Different!
Claims by Insurance Plan and Coverage Level
Policies with Claims
0.70
0.60
0.50
0.40
0.30
0.20
0.10
0.00
CAT
50
55
60
65
70
75
80
85
Coverage Levels
Iowa APH claims
Oklahoma APH claims
Iowa Revenue claims
Oklahoma Revenue Claims
Region, Insurance Plan, & Coverage Level
29
Rare Big Losers
Average
Indemnity
of $98,664
30
Rare Big Losers Results*
• 350 Unique Producers Accounting for
$34,532,565 of Indemnity in 2002
• They Were Flagged at the 0.0001 Level
• Average Indemnity of $98,664
• 72.5 Percent of Their Policies Resulted in a
Significant Loss
*Indicators of Fraud, Waste, and Abuse
31
All or Nothing
• Producers Who Are Associated With All or
Nothing Agents
• All or Nothing Agents are those Agents
Who Have Disproportionate Numbers of
Crop Policies With Total Losses Compared
to Other Agents Within Same Area
• Associated Producers Have Total Loss
Claims
• Associated Producers Who Were
Indemnified in More Than One Year
32
All or Nothing Producers
$12,150,707 Indemnity for 236 Producers
33
2002 Spot Check List Summary
Scenario*
Indemnity
Triplets
$ 4,332,310
Frequent Fliers
$21,718,632
Yield Switching
$15,486,631
Prevented Planting FF
$7,011,644
All or Nothing
$12,150,707
Excessive Yield
$36,201,574
Under Reported Harvest Prod $23,502,812
Rare Big Losers
$32,817,867
Unduplicated Totals
Producers
99
328
285
60
236
389
$137,678,258
225
323
1,808
*Indicators of Fraud, Waste, and Abuse
34
Total 2002 Spot Check List
$137,678,258 Indemnity for 1808 Insureds
35
Data Mining Activities
Publicized in Weekly Newsletter
Newsletter Volume 1, No. 1
Week of February 7, 2003
This week, the development of the 2003 Spot Check List is a
continuing major research activity and includes all CAE staff
members. The following scenarios are the basis for the Spot
Check List (SCL) that will be finalized for delivery to RMA
early in March.
36
Cost-Benefit Analysis
Data Mining Pays Off
37
Cost-Benefit Analysis Examples
• Data Mining In Texas Similar to CAE’s,
Identified Areas of Tax Underpayment
• In FY 2000, The State of Texas Comptroller
Collected An Additional $43 Million in Taxes
From Areas of Underpayment Identified
Through Data Mining
38
Cost-Benefit Analysis
• Texas Blue Cross-Blue Shield Developed
a Medical Insurance Data Warehouse
• In the First Three Months, Data Mining
Identified Enough Medical Fraud to Pay
for the Data Warehouse & Mining
39
Conclusions
• Data Mining Can Detect Patterns of Waste, Fraud,
and Abuse
• Millions of Taxpayer and Insurance Provider Dollars
Can Be Saved Through Data Mining Using Forensic
Analysis Techniques
• This Research Provides USDA with Analysis Tools
Previously Unavailable
40
Conclusions
• Crop Insurance Is Vulnerable to Multiple
Methods of Fraud, Waste and Abuse
• A Small Number of Agents, Adjusters and
Producers Are Linked to Anomalous Behavior
41