Download Detecting Medicaid Data Anomalies Using Data Mining Techniques

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Paper SDA-04
Detecting Medicaid Data Anomalies Using Data Mining Techniques
Shenjun Zhu, Qiling Shi, Aran Canes,
AdvanceMed Corporation, Nashville, TN
ABSTRACT
TM
The purpose of this study is to use statistical and data mining techniques in Base SAS(R) and SAS(R) Enterprise Miner to
proactively reduce the number of false positives caused by data anomalies in Medicaid pharmacy claim data when employing
a rule-based approach to identify overpayments. Typically rule-based techniques are based on specific state Medicaid laws
and policies using certain formulas to detect and identify over charged payments. False positives are defined as an identified
overpayment that is erroneously positive when a claim was paid correctly due to data anomalies or unknown factors. False
positives substantially increase the amount of time and resources spent by the auditors. The specific objective of the study is
to detect and reduce data anomalies by examining the relationships among key variables such as Medicaid amount paid
(MAP), average wholesale price (AWP) and quantity of service in Medicaid pharmacy claim data.
Pharmacy claim data were simulated and the overpayment was calculated by a rule-based approach developed by
AdvanceMed Corporation. Different data mining techniques such as the studentized residual, leverage, Cook‟s distance,
DFFITS and clustering were utilized to capture the abnormal claims and reduce the number of false positives. The results of
this analysis indicated that the clustering statistical method is the best approach to detect these kinds of data anomalies,
followed by the DFFITS method.
INTRODUCTION
AdvanceMed specializes in helping healthcare organizations evaluate and assess the integrity of their health and pharmacy
benefit programs. AdvanceMed conducts sophisticated data analysis to detect potential fraud cases from both the pre and
post payment perspective using rule violations, statistical outliers, etc. to identify health care fraud and abuse. AdvanceMed
aligns itself with cutting edge resources, developments, and capabilities which allows for progressive healthcare integrity in
today‟s fluid environment. Through these efforts, AdvanceMed brings forth all the necessary elements to provide the client with
1
the means to successfully meet its missions.
METHODOLOGY
A simulation was conducted based on Medicaid data. Abnormal claims were added into the simulation data to test different
data mining techniques used to detect data anomalies. Below is a rule-based calculation methodology used by AdvanceMed
to detect the overpayment from pharmacy claim data. This rule-based algorithm is to identify overpayments where state
Medicaid paid more drug units than state policy allowed.
If quantity of service (QOS) is greater than the maximum units (max units) permitted by the state, AdvanceMed can calculate
the overpayment by the following formula:
Overpayment= MAP- (AWP * discount_rate*max_units + dispense_fee).
(1)
The discount rate and dispensing fee are constants for a specific state. Hence by (1), we will have many false positives for
identified overpayment if there exist abnormal claims related to MAP or AWP.
With the exception of strikeouts and errors, MAP should be calculated by a formula using AWP and QOS for each prescription.
Below is a formula AdvanceMed uses to define the relationship between MAP and QOS if no other third party payment exists.
MAP= AWP * QOS * discount_rate + dispense_fee.
(2)
The discount rate and dispensing fee are constant for any prescribed prescription. We can infer from this equation that there
is a linear relationship between MAP and the product of AWP and QOS. Hence we define a new variable called „AQ‟ and let
AQ=AWP*QOS. Then we perform the bivariate association analysis computing the Pearson correlation coefficients between
MAP and AQ. In the simulated data the Pearson coefficient equals 0.91 which means there is a strong positive linear
relationship between MAP and AQ. We then perform regression analysis predicting MAP from AQ.
1
Consider the linear regression model MAP= 1 +  2 *AQ + where the errors ε are independent and all have the same
variance. Observations which have an extreme studentized residual or leverage for the fitted regression model can be
identified as outliers. Cook's distance is a measurement of the influence of the i-th data point on all the other data points. The
higher Cook's distance is the more influential the point is. We consider the claims when Cook‟s distance is greater than 4/n as
outliers. DFFITS shows how influential a point is in a statistical regression. More specifically, it is the difference between the
fitted (predicted) values calculated with and without the i-th observation. We identify the claims with DFFITS greater than
2*sqrt(k/n) as outliers (where k is the number of predictors and n is the number of observations).
Clustering is a statistical method of unsupervised learning. It puts a set of observations into subsets (called clusters) so that
observations are clustered which have similar patterns between the variables. Since there are three distinct drugs in the table,
we determined the number of clusters as k not less than three. SAS Enterprise Miner uses the clustering cubic criterion (CCC)
cutoff value as its main criteria in the selection of number of clusters. In the average linkage method, the distance between
two clusters is defined as the average of the distances between all pairs of objects, where each pair is made up of one object
from each group. The segment identifier is assigned a role of segment. The cluster selects initial seeds that are very wellseparated using a full replacement algorithm. The clustering methods in the Cluster node perform disjoint cluster analysis on
the basis of Euclidean distances. SAS Enterprise Miner uses the Convergence Criterion Value property to specify the value of
the convergence criterion in the computation of cluster seeds. The default convergence value is 0.0001.
RESULTS
The simulated pharmacy claim dataset consists of information about Medicaid pharmacy services. The response variable is
the overpayment, calculated based on state policy. Possible explanatory variables include various measures of Medicaid
pharmacy service. We add some aberrant records to the AWP in the simulated dataset to evaluate the effects of AWP data
anomalies on the identified overpayments in the results. The data structure employed to calculate overpayment by a rulebased methodology is as below:
Table 1: Data Structure for Simulated Pharmacy Claim Table with Calculated Overpayment
Type of
Normal Claims
Abnormal Claims
Total
Total Observations
2,998
300
3,298
Number of Observations for Overpayments
63
9(False Positives)
72
Identified Claim Count Rate (%)
2.10%
3.00%
2.18%
The five highest and lowest overpayments for each drug are below:
2
Figure 1: The Five Highest and Lowest Overpayments for Each Drug
We examined the regression command predicting MAP from AQ. We outputted several statistics that will be needed for the
next few analyses as a dataset called “rx_res”. These statistics include the studentized residual (called r), leverage (called
lev), Cook's Distance (called cd) and DFFITS (called dffit).
First, we used studentized residuals to identify outliers. The studentized residuals were retrieved from the previous regression
analysis output. Ninety-two claims with studentized residuals either less than -1.42 or greater than 3 were identified as outliers
(data anomalies).
Figure 2: Studentized Residuals Distribution
Second, we assess the leverages to identify observations that have a potentially large influence on regression coefficient
estimates.
3
Figure 3: Leverage Distribution
After we closely examine the observations in the simulated dataset as plotted below, the “claim_pk” which is the ID number for
claims in (3258,3236,3036,3270,3300,3228,3260,3136,3130,3111) displays high leverage. As a result, 200 claims with
leverage>0.001 were identified as outliers.
4
Figure 4: “claim_pk” Plot for Leverage and R-squared
0. 0014
2473
2083
2994
2454
2263
2435
2942
2493
2235
2910
2320
2328
2363
2985
2967
2331
0. 0013
0. 0012
0. 0011
0. 0010
0. 0009
0. 0008
0. 0007
0. 0006
0. 0005
0. 0004
2774
2880
2503
2354
2031
2323
2697
2999
2596
2398
2965
2150
2330
2084
2008
2492
2147
2425
2148
2829
2038
2067
2485
2337
2410
2228
2349
2140
2558
2996
2215
2935
2205
2671
2736
2312
2417
2763
2591
2953
2807
2116
2687
2200
2839
2057
2085
2854
2240
2834
2411
2262
2709
2077
2783
2907
2546
2158
2344
2023
2981
2614
2889
2377
2515
2066
2042
3000
2689
2905
2157
2182
2692
2812
2466
2539
2014
2891
2144
2188
2197
2102
2347
2126
2682
2146
2719
2966
2721
2105
2356
2427
2416
2874
2431
2777
2248
2589
2094
2097
2137
2366
2652
2728
2851
2635
2654
2717
2266
2099
2267
2184
2982
2629
2073
2562
2852
2339
2934
2198
2618
2297
2468
2606
2143
2735
2013
2634
2455
2988
2290
2107
2196
2120
2998
2665
2903
2550
2666
2269
2062
2831
2927
2931
2132
2292
2275
2112
2522
2835
2837
2922
2712
2163
2460
2914
2280
2755
2973
2306
2693
2704
2752
2244
2415
2257
2669
2668
2769
2771
2044
2095
2246
2767
2125
2376
3168
2203
2233
2327
2659
3166
2632
2740
2744
2001
2342
2467
2939
2748
2930
2938
2048
2154
2164
2318
2322
2761
2884
2963
3156
2551
3188
2024
2111
2353
2401
2507
2642
2647
2664
2711
2754
2406
2487
2707
2856
2876
3133
2370
2858
2894
2130
2345
2403
2462
2508
2608
2785
2976
3164
3174
2542
2582
2584
2633
2772
2071
2070
2247
2350
2358
2517
2564
2051
2213
2346
2457
2661
2731
2945
2971
3176
3196
2414
2536
2604
2868
3109
3197
2045
2149
2216
2365
2423
2464
2534
2563
2766
3119
2207
2434
2569
2623
2729
2947
2987
3157
2236
2273
2382
2580
3110
2004
2100
2265
2277
2359
2371
2400
2498
2540
2638
2684
2780
2078
2152
2271
2675
2911
2950
3101
3183
2007
2049
2072
2302
2421
2716
2830
2848
2251
2326
2443
2644
2756
2768
2819
2867
3136
3155
2142
2232
2231
2268
2670
2804
2814
2978
2227
2512
2532
2576
2673
2765
2827
2975
3107
2101
2217
2340
2901
2955
2995
3105
3131
3180
3182
3192
2185
2303
2317
2389
2402
2444
2495
2552
2733
2775
2781
2860
3193
2445
3149
3153
3190
3194
2136
2165
2252
2489
2639
2678
2751
2906
2993
3144
3143
3142
2178
2479
2520
2531
2890
2980
2036
2069
2222
2438
2449
2613
2793
2986
2020
2047
2245
2388
2617
2720
2923
2928
2991
2172
2762
2784
2838
2005
2506
2525
2560
2570
2636
2699
2715
2730
3150
3173
3189
2040
2281
2408
2627
2676
3103
3148
2059
2475
2518
2586
2663
3172
2035
2593
2656
2759
2779
2796
2128
2239
2677
2885
2027
2151
3140
2145
2229
2272
2386
2418
2609
2802
3135
3146
2075
2313
2776
3171
2037
2174
2284
2395
2441
2585
2832
2843
3112
2367
2392
2452
2818
2878
2899
3124
3200
2131
2962
2983
3114
3132
3178
2015
2296
2959
2984
2012
2230
2301
2300
2764
2786
2805
3138
3158
2021
2096
2162
2500
2789
2823
2883
2925
3163
3177
3185
2171
2429
2577
2463
3154
3186
2329
2450
2499
3113
3116
3141
2448
2488
2549
3195
2054
2134
2334
2713
2801
3118
2557
2660
3160
2496
2626
3129
2836
3121
2969
3122
3120
3179
2159
2451
2547
2622
3108
3117
3137
3165
3102
3127
3145
3159
3181
3191
2641
3161
3123
3152
3198
2412
2787
2258
3115
3126
3147
3175
3184
2840
3125
2794
2624
3187
3106
3139
3169
2833
2428
2180
2816
2422
3128
3151
3167
2896
2181
3162
3170
2482
3104
2091
2378
2685
2195
2364
2087
2224
2369
2758
2847
2949
2029
2859
2989
2018
2396
2537
2897
2990
2088
2695
2533
2797
2291
2478
3199
2156
2571
2958
2010
2153
2177
2199
2379
2476
2619
2881
2104
2283
2282
2310
2387
2541
2637
2701
2893
2006
2274
2409
2599
2888
3134
2135
2238
2293
2472
2597
2773
2788
2861
2997
2086
2138
2315
2314
2461
2484
2553
2873
2902
2937
3111
2288
2413
2118
2161
2385
2578
2864
2951
2956
2065
2243
2278
2374
2458
2977
2061
2186
2556
2573
2841
2220
2226
2316
2380
2519
2667
2725
2017
2026
2141
2214
2264
2343
2381
2465
2501
2611
2648
2782
2909
2919
2043
2194
2567
2630
2681
2710
2778
2790
2815
2912
2032
2122
2299
2305
2394
2548
2587
2615
2680
2791
2850
2957
2210
2225
2294
2324
2333
2433
2446
2481
2516
2658
2760
2806
2853
2430
2502
2795
2855
2915
2944
2483
2511
2590
2698
2746
2809
2808
2863
2869
2898
2041
2093
2259
2368
2509
2574
2821
2824
2916
2003
2384
2420
2477
2581
2820
2879
2016
2060
2121
2459
2583
2600
2724
2734
2866
3130
2046
2391
2703
2706
2870
2926
2968
2139
2279
2397
2497
2857
2974
2336
2360
2895
2929
2033
2191
2256
2335
2471
2491
2527
2686
2936
2028
2074
2110
2241
2307
2419
2828
2940
2114
2373
2426
2108
2170
2179
2362
2561
2568
2653
2732
2844
2908
2924
2076
2160
2168
2332
2572
2030
2124
2405
2708
2865
2871
2952
2063
2189
2657
2954
2285
2319
2424
2440
2588
2133
2447
2513
2592
2610
2749
2964
1119
2103
2169
2219
2260
2612
2714
2826
2117
2289
2535
2621
2080
2270
2348
2524
2595
2645
2800
2123
2155
2201
2211
2757
2822
2050
2175
2691
2166
2510
2529
1109
2206
2375
2875
2970
2079
2204
2221
2544
2747
2504
2887
2352
2528
2543
2900
2439
2192
2726
2287
2579
2941
2523
2039
2325
2625
2025
2690
2620
2480
1287
1330
2190
2862
2770
2242
2750
1442
1725
1848
2009
2286
2486
2700
2825
2845
2872
2321
2813
1467
2019
2393
2469
2616
2792
2886
2237
2882
1300
1344
1576
2106
2212
2932
2209
2357
2913
1317
1960
2119
2167
2651
2811
2817
1476
1775
2053
2193
1408
2202
1582
1945
2208
2249
2933
2948
2082
2846
2127
2187
2295
2311
2603
2737
2918
2034
2055
2304
2341
2554
1308
2173
2183
2530
2598
2628
2261
2702
2743
2892
2081
2662
2877
2943
2992
2113
2679
2705
2727
2917
2255
2254
2351
2361
2372
2494
2904
2058
2355
2566
2437
2649
2655
2722
2753
2799
2453
2456
2526
2696
2718
2849
2972
2092
2250
2399
2404
2490
2607
2741
2098
2383
2602
2601
2631
2745
2309
2575
2129
2276
2521
2646
2688
1602
2022
2056
2298
2407
2672
2803
2052
2234
2308
2594
2640
2723
2960
1523
1831
2176
2218
2650
2674
2921
1361
1573
2436
2514
2798
1194
1673
2643
2694
1291
1423
2011
1143
1181
1285
1491
1813
1937
2338
2683
2946
1188
1286
1333
1387
1443
1510
1663
2109
1138
1731
1030
1040
1049
1052
1074
1073
1223
1311
1675
1709
1745
1862
2253
2559
1051
1090
1150
1360
1391
2810
1236
1372
1398
1469
1603
1606
1632
1833
1957
2470
1016
1086
1167
1198
1241
1485
1592
1755
1786
2920
1005
1020
1038
1249
1289
1305
1591
1610
1697
1913
2089
2538
2842
1125
1148
1338
1374
1455
1526
1667
1686
1844
2223
1029
1035
1176
1206
1219
1336
1362
1410
1426
1528
1689
1770
1797
1987
1037
1235
1302
1588
1594
1605
1932
2555
1033
1063
1495
1566
1649
1220
1870
2738
1187
1288
1463
1533
1555
1902
1920
1983
1205
1233
1699
1754
1825
1583
1660
1659
1714
1600
1677
1371
1279
1334
2115
1011
1874
1822
1411
1543
1683
1948
1768
1993
1966
1369
1409
1155
1695
1142
1179
1553
1911
1919
1025
1214
1532
1078
1693
1730
1425
1631
1856
1882
1343
1946
1817
1982
1061
1805
1257
1674
1710
1713
1787
1044
1131
1807
1985
1057
1267
1759
1315
1413
1161
1380
1386
1826
1435
1744
1050
1650
1678
1196
1705
1162
1118
1192
1297
1889
1906
1961
1245
1304
1351
1484
1716
1963
1803
1776
1397
1827
1031
1322
1840
1022
1085
1145
1464
1525
1554
1584
1853
1881
1095
1144
1172
1404
1515
1711
1941
1984
1225
1596
2505
1345
1186
1190
1072
1383
1479
1505
1550
1561
1571
1652
1783
1858
1990
1027
1500
1587
1618
1639
1739
1753
1698
1477
1104
1106
1347
1428
1790
1811
1835
1034
1071
1103
1185
1231
1295
1540
1796
1922
1004
1135
1475
1559
1608
1704
1742
1928
1446
1501
1551
1624
1644
1666
1842
1924
1615
2000
2090
2979
1574
1580
1370
1432
1445
1458
1944
1048
1108
1173
1262
1294
1303
1417
1564
1581
1877
1940
1942
1001
1152
1177
1180
1377
1508
1647
1761
1997
1019
1077
1213
1478
1743
1113
1158
1209
1321
1331
1416
1449
1687
1741
1933
1136
1394
1635
1642
1671
1794
1307
1665
1696
1828
1904
1002
1195
1228
1750
1066
1278
1355
1884
1045
1264
1298
1368
1552
1703
1758
1900
1059
1088
1254
1306
1339
1346
1378
1390
1531
1548
1327
1332
1366
1456
1270
1969
1487
1128
1480
1124
1437
1681
1751
1903
1995
1046
1165
1516
1518
1746
1841
1878
1082
1777
1917
1348
1601
1723
1815
2442
1283
1414
1977
1122
1329
1271
1341
1356
1578
1771
1820
1080
1232
1405
1440
1444
1454
1662
1700
1850
1854
1589
1901
1036
1097
1217
1293
1792
1975
1184
1680
2739
2961
1393
1598
1389
1951
1972
1091
1326
1373
1784
1253
1502
1829
1083
1472
1412
1422
1719
1866
1534
1748
1259
1973
1499
1918
1512
1134
1268
1541
1672
1956
1123
1872
1028
1646
1692
1047
20021981
1137
1247
1707
1722
1494
1617
1522
1178
1905
1221
1949
1832
1846
2064
2432
1140
1024
1855
2742
1400
1804
1898
1316
1538
1868
1939
1712
2068
1342
1607
2474
2545
1627
1434
1468
1064
1773
1929
1936
1251
1447
2390 1740
1376
1738
1238
1058
1482
1954
1892
1488
1062
1274
1296
1821
1089
1392
1470
1201
1542
1171
1622
1363
1979
1312
1726
1292
1791
1943
1299
1385
1473
1664
1830
1636
1139
1418
1657
1992
1075
1955
1182
1396
1766
1887
1053
1087
1272
1728
1756
1893
1191
1679
1875
1266
1504
1585
1641
1888
1967
1202
1067
1197
1529
1685
1808
1837
1117
1421
1560
1669
1708
1814
1839
1968
1546
1565
1670
1733
1861
1381
1729
1970
1702
1812
1867
1407
1141
1962
1965
1098
1157
1301
1453
1497
1637
1895
1915
1930
1989
1160
1107
1100
1222
1401
1629
1980
1524
1764
1834
1099
1282
1384
1613
1654
1653
1847
1215
1230
1772
1780
1860
1926
1054
1403
1558
1824
1896
1156
1340
1430
1527
1843
1934
1263
1514
1623
1013
1101
1212
1218
1007
1012
1023
1042
1503
1324
1691
1706
1243
1175
1246
1248
1110
1237
1520
1694
1907
1189
1616
1724
1763
1539
1688
1947
1779
1009
1250
1382
1427
1474
1886
1310
1076
1070
1120
1159
1597
1701
1204
1658
1778
1782
1789
1798
1950
1224
1547
1645
1864
1923
1325
1402
1431
1795
1931
1935
2605
1450
1424
1656
1065
1079
1147
1164
1203
1364
1483
1638
1890
1988
1575
1604
1661
1115
1352
1557
1595
1873
1130
1277
1126
1256
1281
1535
1041
1462
1621
1747
1026
1318
1375
1486
1549
1556
1563
1816
1899
1313
1507
1676
1570
1801
1879
1640
1018
1769
2565
1823
1851
1309
1055
1153
1239
1429
1737
1974
1419
1567
1765
1273
1509
1809
1579
1852
1994
1353
1114
1151
1200
1395
1865
1032
1319
1489
1496
1655
1863
1891
1436
1819
1094
1976
1003
1999
1379
1388
1800
1668
1799
1216
1883
1269
1415
1234
1978
1451
1459
1481
1350
1785
1836
1365
1433
1986
1227
1420
1328
1043
1163
1727
1068
1105
1460
1818
1536
1154
1859
1006
1060
1908
1017
1349
1149
1466
1760
1229
1927
1015
1717
1735
1628
1255
1897
1359
1959
1069
1530
1736
1885
1916
1682
1958
1880
1593
1146
1452
1625
1633
1648
1718
1199
1612
1614
1952
1183
1261
1207
1788
1914
1211
1265
1998
1810
1493
1513
1242
1252
1537
1690
1774
1909
1129
1849
1971
1609
1008
1092
1684
1406
1490
1569
1996
1323
1572
1749
1335
1471
1762
1938
1876
1599
1492
1438
1465
1715
1314
1337
1102
1244
1802
1111
1132
1869
1991
1367
1857
1448
1210
1626
1910
1021
1752
1517
1619
1093
1039
1399
1806
1014
1168
1290
1871
1611
1133
1441
1521
1280
1838
1112
1461
1276
1734
1116
1562
1767
1010
1568
1721
1912
1240
1498
1056
1193
1357
1781
1121
1226
1590
1084
1634
1169
1793
1953
1170
1096
1894
1208
1260
1511
1845
1320
1732
1964
1354
1545
1519
1921
1620
1651
1643
1925
1457
1544
1127
1284
1577
1720
1506
1258
1081
1439
1586
1630
1275
1757
1358
3021
3063
351
163
538
3034
3093
3007
3051
3089
3031
3082
3016
3064
3070
3067
3018
3049
3035
3100
3030
3096
3059
3013
3024
3098
3080
3037
3047
3039
3087
3071
3056
3086
3003
3043
730
3022
644
3040
23
3002
3068
247
548
788
148
973
853
323
31
8
3006
3038
22
108
3092
240
3052
3078
3010
3084
3004
3033
3073
3083
3062
3077
3046
3058
3076
3099
3017
3011
3045
3090
3072
3057
3015
3032
3009
3026
3065
3036
3048
3054
3028
3019
3014
651
3074
3081
3012
3097
3091
3044
3008
3053
3020
3042
3050
3088
3023
3029
3075
3095
216
318
231
574
3085
3055
112
483
887
918
863
120
501
680
873
722
809
365
394
620
990
145
428
557
608
700
99
117
701
103
131
260
291
518
723
826
637
828
916
685
856
894
141
397
573
836
403
255
470
603
923
544
684
787
963
746
760
664
223
72
232
89
513
3060
422
147
492
688
732
267
102
177
480
628
920
922
221
239
400
477
532
829
885
903
972
176
294
334
488
776
844
63
213
500
830
967
34
51
455
724
32
57
3094
3001
838
866
974
364
708
919
86
114
142
180
569
650
955
971
276
412
559
951
48
580
733
704
899
26
38
317
485
568
586
673
855
911
982
54
375
927
772
3066
694
683
779
3069
3027
3079
452
172
508
462
454
798
132
646
331
753
290
804
475
979
275
698
521
961
3005
781
332
3041
690
3025
604
192
3061
124
556
316
342
653
737
562
868
545
272
796
975
719
151
195
322
347
386
389
388
493
537
542
583
657
679
808
846
859
143
161
208
215
677
705
786
842
875
891
907
924
154
280
496
553
582
821
867
874
915
991
219
226
451
642
667
728
774
912
952
95
306
834
29
135
25
30
44
77
93
94
50
448
752
890
309
466
498
550
566
602
861
101
271
315
393
441
482
502
510
514
588
618
906
933
130
156
162
212
237
236
311
339
346
362
373
519
527
572
610
615
617
631
630
640
666
676
675
678
710
709
716
734
749
754
757
105
146
159
182
191
202
207
249
265
293
299
324
349
354
366
421
424
431
450
476
479
478
499
511
541
551
599
611
665
691
725
729
775
805
877
959
968
125
127
235
329
337
340
405
434
468
489
635
661
720
770
784
802
833
841
869
871
879
934
965
41
189
197
284
303
319
396
408
456
670
682
693
718
748
931
936
945
980
989
81
251
358
984
12
19
52
220
837
24
69
73
92
14
78
10
234
766
194
581
639
759
800
840
946
128
185
457
523
671
687
771
847
878
944
115
144
150
209
242
292
307
419
418
427
442
495
613
636
699
780
785
864
988
999
160
166
186
285
361
395
534
555
762
811
816
819
827
889
949
119
138
175
190
198
298
367
440
520
529
576
590
669
707
851
913
938
941
962
168
363
371
996
15
56
82
763
888
21
27
55
59
68
42
66
6
7
278
460
252
504
206
301
813
850
937
90
149
155
164
230
273
312
547
795
895
901
909
970
261
314
333
352
357
384
443
494
638
674
695
738
756
854
860
80
250
469
696
9
295
536
612
928
803
711
947
121
297
387
413
430
444
503
552
554
561
563
731
761
898
28
85
183
233
325
378
391
426
458
474
486
491
517
622
647
660
745
758
807
814
820
849
852
886
902
35
100
110
113
116
122
173
170
179
187
193
199
210
217
277
296
305
310
313
330
344
359
368
377
401
411
423
429
445
481
497
505
524
560
570
587
629
641
645
686
692
703
712
727
726
744
777
797
812
824
858
862
884
917
935
954
969
985
4
16
65
71
171
253
281
345
343
369
379
415
459
461
533
571
585
609
663
672
751
764
893
892
900
926
925
932
960
997
1000
83
139
463
857
1
11
60
64
97
104
109
111
153
152
158
201
200
222
266
286
350
360
372
382
407
414
417
438
437
436
484
522
584
594
593
592
591
605
627
625
643
652
706
717
768
782
793
930
950
957
983
987
986
994
3
2
5
13
18
17
20
33
37
36
40
39
43
47
46
45
49
53
58
62
61
67
70
76
75
74
79
84
88
87
91
98
96
107
106
118
123
126
129
134
133
137
136
140
157
165
167
169
174
178
181
184
188
196
205
204
203
211
214
218
225
224
229
228
227
238
241
246
245
244
243
248
254
259
258
257
256
264
263
262
270
269
268
274
279
283
282
289
288
287
300
302
304
308
321
320
328
327
326
336
335
338
341
348
353
356
355
370
374
376
383
381
380
385
390
392
399
398
402
404
406
410
409
416
420
425
433
432
439
435
447
446
449
453
465
464
467
473
472
471
487
490
509
507
506
512
516
515
526
525
528
531
530
535
540
539
543
546
549
558
565
564
567
575
579
578
577
589
598
597
596
595
601
600
607
606
614
616
619
621
626
624
623
634
633
632
649
648
656
655
654
659
658
662
668
681
689
697
702
715
714
713
721
736
735
743
742
741
740
739
747
750
755
765
769
767
773
778
783
794
792
791
790
789
799
801
806
810
815
818
817
823
822
825
832
831
835
839
843
845
848
865
870
872
876
883
882
881
880
897
896
905
904
908
910
914
921
929
940
939
943
942
948
953
958
956
964
966
978
977
976
981
995
993
992
998
3269
3282
3201
3247
3215
3264
3203
3257
3265
3224
3262
3230
3290
3235
3280
3246
3207
3287
3213
3208
3209
3248
3205
3276
3259
3220
3295
3239
3296
3245
3271
3232
3214
3263
3274
3222
3291
3225
3254
3242
3217
3233
3218
3277
3283
3226
3293
3223
3298
3278
3206
3240
3229
3212
3238
3243
3289
3252
3237
3255
3250
3284
3231
3294
3227
3253
3266
3261
3288
3279
3275
3202
3251
3281
3285
3210
3244
3297
3272
3204
3211
3299
3241
3219
3292
3256
3258
3236
3216
3273
3270
33003260
3228
3249
3221
3268
3286
3267
3234
0. 0003
0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
r squar ed
SAS code:
proc univariate data=rx_res plots;
var lev;
run;
proc sql;
create table rx_res2 as
select *, r**2 as rsquared
from rx_res;
quit;
goptions reset=all;
axis1 label=(r=0 a=90);
symbol1 pointlabel = ("#claim_pk") font=simplex value=none;
proc gplot data=rx_res2;
plot lev*rsquared / vaxis=axis1;
run;
quit;
The results of Cook‟ distance showed that there were 118 claims with Cook‟s distance>4/3298 and 195 claims with an
absolute value of DFFITS>2*sqrt(1/3298) which were considered as outliers.
TM
We used the SAS(R) Enterprise Miner to do the cluster analysis. Each observation represents a claim for overpayment
detection. The following is the flow diagram of this clustering model design.
5
Figure 5: Flow Diagram of the Clustering Model
In the “RX_SIMU” node, we did not use any target information created by a rule-based algorithm because it is not necessary
for the unsupervised learning model. In the “Replacement” node, we replaced the missing value of character variables with
“Unknown” and ignored the missing values of interval variables. In the “Transform Variables” node, we created a new variable
“log_aq” by employing the formula: log_aq=log (AWP*quantity_of_service+1). To reduce the variance of the variable “AQ”
which has a skewness of 17.76, a log transformation on “AQ” was performed and a new variable log_aq was created. Below
are the statistics after the log transformation.
Figure 6: Transformation Statistics of “log_aq”
The cluster selects initial seeds that are very well-separated using a full replacement algorithm. The following pie chart shows
there are 6 segments selected for this clustering.
6
Figure 7: Segment Size Plot
There are 3 segments with sizes of around 1000 observations each, and 3 segments which have sizes of 100 observations
each.
From the distribution of each variable within the segments, we know that most of them are evenly distributed within each
segment and they appear the same in the pairs of (1, 6), (2, 4) and (3, 5).
Figure 8: Cluster Proximities Plot
Cluster proximity for average clustering is defined as the average pairwise distance of all pair of points from different clusters.
From the plot of cluster proximities, the pattern becomes obvious. The distance of cluster proximities for the segment pairs of
(1, 6), (2, 4) and (3, 5) are very close to each other. From the segment size plot, the sizes of segment 4, 5 and 6 are very small
compared to their closest segments and hence can be identified as abnormal claims. After figuring out which variable caused
this abnormality we used SAS code node to delete segments=4, 5 and 6. There are 300 claims in segments 4, 5 and 6
identified as abnormal claims.
SAS code:
7
libname cls "C:\Documents and Settings\Administrator\Desktop\paper reference";
data cls.rx_clus;
set &em_import_data.;
if _segment_ in (1,2,3);
drop _segment_ distance im_awp im_log_aq im_max_units
im_medicaid_amount_paid
im_period im_quantity_of_service im_national_drug_code _impute_ log_aq;
run;
The following is the summary of experiment results for Student Residual, Leverage, Cook‟s distance, DFFITS and Clustering.
Table 2: Summary of Experiment Results
Number
of
Abnormal
Claims
Removed
Abnormal
Claims
Capture Rate
Number of False
Positives
Removed
Student
Residual
87
29%
0
0%
5
0.17%
Leverage
2
1%
0
0%
198
6.60%
Cook's Distance
195
65%
0
0%
3
0.10%
DFFITS
182
61%
6
67%
35
1.17%
Clustering
300
100%
9
100%
0
0.00%
Statistical
Techniques
False
Positives
Capture Rate
Number of
Normal
Removed
Normal Claims
Misclassification
Rate
CONCLUSION
When working with Medicaid data, AdvanceMed has learned that there are different types of data anomalies in Medicaid
pharmacy claim data. A simulation of the pharmacy claim file shows that false positives are caused by these anomalies in a
rule based algorithm. To avoid false positives, we introduced five different statistical approaches to detect and eliminate the
abnormal claims. The results of this study indicate that clustering technique is the best approach, followed by DFFITS.
8
REFERENCES
1
AdvanceMed Corporation
http://www.csc.com/advancemed/ds/11858-about_us
ACKNOWLEDGMENTS
Special Thanks to Tom Mathis, who is the program director of AdvanceMed Corporation, for his patience and support. Huge
thanks to Rick Wells, who is the project director of AdvanceMed Corporation, for his incredibly understanding and sincere
encouragement. Finally, to all of the colleagues who perfectly demonstrate creative excellences — thank you.
CONTACT INFORMATION
Your comments and questions are valued and encouraged. Contact the authors at:
SHENJUN ZHU
Chief Statistician,
AdvanceMed Corporation,
2636 Elm Hill Pike, Suite 110, Nashville, TN 37214
p: 615.425.2481 | f: 615.872.0272
[email protected] | www.admedcorp.com
QILING SHI
Data Analyst, Mathematics PhD
AdvanceMed Corporation,
2636 Elm Hill Pike, Suite 110, Nashville, TN 37214
p: 615.425.2451 | f: 615.872.0272
[email protected], [email protected]| www.admedcorp.com
ARAN CANES
Data Analyst, Economics MA
AdvanceMed Corporation,
2636 Elm Hill Pike, Suite 110, Nashville, TN 37214
p: 615.872.0272 | f: 615.872.0272
[email protected] | www.admedcorp.com
SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in
the USA and other countries. ® indicates USA registration.
Other brand and product names are trademarks of their respective companies.
9
Related documents