Download A large number of problems in data mining are related to fraud

Machine Learning: SPSS Neural Networks, KNN and Bayesian Methods Application Area Insurance, Fraud Detection Data Mining Task Classification Number of Instances 15,900 Number of Attributes 31 Abstract: A large number of problems in data mining are related to fraud detection. Fraud is a common problem in auto insurance claims, health insurance claims, credit card transactions, financial transaction and so on. The data in this case comes from an actual auto insurance company. Each record represents an insurance claim. The last column in the table tells you whether the claim was fraudulent or not. A number of people have used this dataset and here are some observations from them: • • • “This is an interesting data because the rules that most tools are coming up with do not make any intuitive sense. I think a lot of the tools are overfitting the data set.” “The other systems are producing low error rates but the rules generated make no sense.” “It is OK to have a higher overall error rate with simple human understandable rules for a business use case like this.” There are two datasets (Excel Files) – 1. Insurance Fraud – TRAIN-3000, and 2. Insurance Fraud – TEST-12900. Train with the two neural network methods (multilayer perceptron and radial basis functions) and the KNN and Bayesian Network methods by changing the parameters such as hidden nodes in each layer with target variable as Fraud Prediction –Yes/No ( Flag). This is an imbalanced dataset. Fraud cases are hard to find and hard to separate from non-fraud cases. SO I did one or more decision trees by modifying the costs to obtain better accuracy.Accuracy is calculated as per the below formula. PREDICTED CLASS ACTUAL CLASS . Accuracy  Class=Yes Class=No Class=Yes a (TP) b (FN) Class=No c (FP) d (TN) ad TP  TN  a  b  c  d TP  TN  FP  FN Recommendations: After comparing all the trees, I got the highest accuracy rate of 96.14% for Radical Basis Perceptron tree as below The tree was as below The Top predictors were: Accident area and Policy portfiled year had the highest predictor importance.

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download A large number of problems in data mining are related to fraud