Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Neuroinformatics wikipedia , lookup
Corecursion wikipedia , lookup
Data analysis wikipedia , lookup
Generalized linear model wikipedia , lookup
Predictive analytics wikipedia , lookup
Numerical weather prediction wikipedia , lookup
History of numerical weather prediction wikipedia , lookup
Operational transformation wikipedia , lookup
General circulation model wikipedia , lookup
Computer simulation wikipedia , lookup
Tropical cyclone forecast model wikipedia , lookup
Data Quality Models for High Volume Transaction Streams: A Case Study Joseph Bugajski Visa International Robert Grossman, Chris Curry, David Locke & Steve Vejcik Open Data Group The Problem: Detect Significant Changes in Visa’s Payments Network Account Issuing Bank Merchant Acquiring Bank Visa Payment Network Over 1.59 billion Visa cards in circulation 6800 transactions per second (peak) Over 20,000 member banks Millions of merchants The Challenge: Payments Data is Highly Heterogeneous Variation from cardholder to cardholder Variation from merchant to merchant Variation from bank to bank Observe: If Data Were Homogeneous, Could Use Change Detection Model Baseline Model Observed Model Sequence of events x[1], x[2], x[3], … Question: is the observed distribution different than the baseline distribution? Use simple CUSUM & Generalized Likelihood Ratio (GLR) tests 4 10 + Key Idea: Build Models, One for Each Cell in Data Cube Build separate model for each 20,000+ separate bank (1000+) baselines Geospatial Build separate model for each region geographical region (6 regions) Build separate model for each different type of merchant (c. Type of 800 types of merchants) Transaction For each distinct cube, Bank establish separate baselines for each metric of interest Modeling using Cubes (declines, etc.) of Models (MCM) Detect changes from baselines Greedy Meaningful/Manageable Balancing (GMMB) Algorithm • More alerts Breakpoint • Fewer alerts • Alerts more meaningful • Alerts more manageable • To increase alerts, add breakpoint to split cubes, order by number of new alerts, & select one or more new breakpoints •To decrease alerts, remove breakpoint, order by number of decreased alerts, & select one or more breakpoints to remove One model for each cell in data cube Augustus Open source Augustus data mining platform was used to: • Estimate baselines for over 15,000 separate segmented models • Score high volume operational data and issue alerts for follow up investigations Augustus is PMML compliant Augustus scales with • Volume of data (Terabytes) • Real time transaction streams (15,000/sec+) • Number of segmented models (10,000+) Some Results to Date System has been operational for 2.5 years ROI – 5.1x Year 1 (over 6 months) – 7.3x Year 2 (12 months) – 10.0x Year 3 (12 months) Currently estimating over 15,000 individual baseline models The system has issued alerts for: • Merchants using incorrect Merchant Category Code (MCC) - account testing • Sales channel variations • Incorrect use of merchant city name field • Incorrect coding or recurring payments Summary Used new methodology (Modeling using Cubes of Models) for modeling large, highly heterogeneous data sets This project contributed in part to the development of a Baseline Model in the open source Augustus system Integrated system to generate alerts using Baseline Models with manual investigation process Project is generating over 10x ROI Poster #20 in the Tuesday night Poster Session.