Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Production Model Lifecycle Management Prepared for Wed, June 22, 2016 [email protected] Linkedin.com/in/GregMakowski © 2016 LigaData, Inc. All Rights Reserved. Contents Develop a Robust Solution (or get fired) Selecting the Best Model w/ Model Notebook Describing the Model Putting a Model in Production Model Drift over Time (Non-Stationary) Retrain, Refresh or update DBC Preprocessing Kamanja Open Source PMML Scoring Platform © 2016 LigaData, Inc. All Rights Reserved. | 2 Develop a Robust Solution (or get fired) Epsilon (owned by American Express then) ACG’s first neural network (1992) (~40 quants in Analytic Consulting Group) Score 250mm house holds every month, pick the best 5mm hh Neural net by a previous consultant, did great “in the lab” !! did “reasonable” month 1 did “worse” month 2 ”bad” month 3 (no lift over random) prior consultant was fired I was hired, and told why I was replacing him My model captured the same response with 4mm hh mailed was stable for 24+ months, saved $1mm / month © 2016 LigaData, Inc. All Rights Reserved. | 3 Contents Develop a Robust Solution (or get fired) Selecting the Best Model w/ Model Notebook Describing the Model Putting a Model in Production Model Drift over Time (Non-Stationary) Retrain, Refresh or update DBC Preprocessing Kamanja Open Source PMML Scoring Platform © 2016 LigaData, Inc. All Rights Reserved. | 4 Model Notebook Bad vs. Good 5 Model Notebook Q) What is the best outcome metric? ROC, R2, Lift, MAD …. Bad vs. Good 6 Model Notebook Q) What is the best outcome metric? ROC, R2, Lift, MAD …. A) Deployment simulation of cost-value-strategy Is the business deployment over all the score range? [0… 1]? Just over the top 1% or 5% of the score (then not ROC,Bad R2, corr) vs. Are some records 5* or 20* more valuable? Use cost-profit weighting, or more complex system Good 7 Calculate $ of “Business Pain” zero error Unde r Stock Need to Deeply Understand Business Metrics Over Stoc k Calculate $ of “Business Pain” zero error Under Stock 15% business pain $ Need to Deeply Understand Business Metrics 1% bus pain $ ←Equal mistakes → Unequal PAIN in $ ? Over Stoc k Calculate $ of “Business Pain” Data Mining++ Retail No way – that could get you fired! New progress in getting feedback zero error Under Stock 15% business pain $ 30% bus pain $ 1% bus pain $ ←Equal mistakes → Unequal PAIN in $ Over Stock 4 week supply of SKU → 30% off sale Model Notebook Outcome Details • My Heuristic Design Objectives: (yours may be different) – Accuracy in deployment – Reliability and consistent behavior, a general solution • Use one or more hold-out data sets to check consistency • Penalize more, as the forecast becomes less consistent – No penalty for model complexity (if it validates consistently) • Let me drive a car to work, instead limiting me to a bike – Message for check writer – Don’t consider only Occam’s Razor: value consistent good results – Develop a “smooth, continuous metric” to sort and find 11 models that perform “best” in future deployment Model Notebook Outcome Details • Training = results on the training set • Validation = results on the validation hold out • Gap = abs( Training – Validation ) A bigger gap (volatility) is a bigger concern for deployment, a symptom Minimize Senior VP Heart attacks! (one penalty for volatility) Set expectations & meet expectations Regularization helps significantly • Conservative Result = worst( Training, Validation) + Gap_penalty Corr / Lift / Profit → higher is better: Cons Result = min(Trn, Val) Gap MAD / RMSE / Risk → lower is better: Cons Result = max(Trn, Val) + Gap 12 Model Notebook Bad vs. Good 13 Model Notebook Process Tracking Detail ➔ Training the Data Miner Input / Test Outcom e To p 5% Regressio n Top 10 % Heuristic Strategy: 1) Try a few models of many algorithm types Top 20 % (seed the search) 1) Opportunistically spend more effort on what is working (invest in top stocks) Yippeee ! AutoNeura l Neura l The Data Mining Battle Field Mor e 2) Still try a few trials on medium success (diversify, limited by project time-box) 1) Try ensemble methods, combining model forecasts & top source vars w/ model Contents Develop a Robust Solution (or get fired) Selecting the Best Model w/ Model Notebook Describing the Model Putting a Model in Production Model Drift over Time (Non-Stationary) Retrain, Refresh or update DBC Preprocessing Kamanja Open Source PMML Scoring Platform © 2016 LigaData, Inc. All Rights Reserved. | 15 When Rejecting Credit – Law Requires 4 Record Level Reasons The law does not care how complex the model or ensemble was.. i.e. NOT sex, age, marital status, race, …. i.e. ”over 180 days late on 2+ bills” There are solutions to this constraint, for an arbitrary black box The solutions have broad use in many areas of the model lifecycle © 2016 LigaData, Inc. All Rights Reserved. | 16 Should a data miner cut algorithm choices, so they can come up with reasons? © 2016 LigaData, Inc. All Rights Reserved. | 17 Should a data miner cut algorithm choices, so they can come up with reasons? 97% of the time, NO! Focus on the most GENERAL & ACCURATE system first “I understand how a bike works, but I drive a car to work” “I can explain the model, to the level of detail needed to drive your business” A VP does not need to know how to program a B+ tree, in order to make a SQL vendor purchase decision. (Be a trusted advisor) © 2016 LigaData, Inc. All Rights Reserved. | 18 Solution – Sensitivity Analysis (OAT) One At a Time Target field Arbitrarily Complex Data Mining System (S) Source fields For source fields with binned ranges, sensitivity tells you importance of the range, i.e. “low”, …. “high” Can put sensitivity values in Pivot Tables or Cluster Record Level “Reason codes” can be extracted from the most important bins that apply to the given record https://en.wikipedia.org/wiki/Sensitivity_analysis © 2016 LigaData, Inc. All Rights Reserved. | 19 Solution – Sensitivity Analysis (OAT) One At a Time Delta in forecast Target field Arbitrarily Complex Data Mining System (S) Source fields For source fields with binned ranges, sensitivity tells you importance of the range, i.e. “low”, …. “high” Can put sensitivity values in Pivot Tables or Cluster Record Level “Reason codes” can be extracted from the most important bins that apply to the given record Present record N, S times, each input 5% bigger (fixed input delta) Record delta change in output, S times per record © 2016 LigaData, Inc. All Rights Reserved. | 20 Aggregate: average(abs(delta)), target change per input field delta Solution – Sensitivity Analysis Applying Reasons per record • Reason codes are specific to the model and record record 1 record 2 • Ranked predictive fields Mr. Smith Mr. Jones max_late_payment_120d max_late_payment_90d bankrupt_in_last_5_yrs max_late_payment_60d 0 0 1 1 • Mr. Smith’s reason codes include: 1 0 1 0 © 2016 LigaData, Inc. All Rights Reserved. | 21 Contents Develop a Robust Solution (or get fired) Selecting the Best Model w/ Model Notebook Describing the Model Putting a Model in Production Model Drift over Time (Non-Stationary) Retrain, Refresh or update DBC Preprocessing Kamanja Open Source PMML Scoring Platform © 2016 LigaData, Inc. All Rights Reserved. | 22 Putting a Model in Production Cut out extra preprocessed variables not used in final model Minimize passes of the data Many situations, I have had to RECODE prep and/or model to meet production system requirements • BAD: recode to Oracle, move SAS to mainframe & create JCL Could take 2 months for conversion & full QA • GOOD: Generate PMML code for model Build up PMML preprocessing library, like Netflix © 2016 LigaData, Inc. All Rights Reserved. | 23 www.DMG.org/ PMML/products Putting a Model in Production © 2016 LigaData, Inc. All Rights Reserved. | 24 Contents Develop a Robust Solution (or get fired) Selecting the Best Model w/ Model Notebook Describing the Model Putting a Model in Production Model Drift over Time (Non-Stationary) Retrain, Refresh or update DBC Preprocessing Kamanja Open Source PMML Scoring Platform © 2016 LigaData, Inc. All Rights Reserved. | 25 Tracking Model Drift A trained model is only as general as the variety of behavior in the training data the artifacts abstracted out by preprocessing Over time, there is “drift” from the behavior represented in the scoring data, and the original training data © 2016 LigaData, Inc. All Rights Reserved. | 26 Tracking Model Drift (easy to see with 2 input dimensions vs. score) Current Scoring Data Training Data © 2016 LigaData, Inc. All Rights Reserved. | 27 Tracking Model Drift MODEL DRIFT DETECTOR • Change in distribution of target (alert over threshold) During training, find thresholds for 10 or 20 equal frequency bins of the score During scoring, look at key thresholds around business decisions (act vs not) Has the % over the fixed threshold changed much? Chi-square or KL Divergence (contingency table metrics) • Change in distribution of most important input fields Diagnose CAUSES Out of the top 25% of the most important input fields… Which had the largest change in contingency table metric? © 2016 LigaData, Inc. All Rights Reserved. | 28 Contents Develop a Robust Solution (or get fired) Selecting the Best Model w/ Model Notebook Describing the Model Putting a Model in Production Model Drift over Time (Non-Stationary) Retrain, Refresh or update DBC Preprocessing Kamanja Open Source PMML Scoring Platform © 2016 LigaData, Inc. All Rights Reserved. | 29 Retrain, Refresh or Update DBC Model Retrain • Brute force, most effort, most expense, most reliable • Repeat the full data mining model training project • Re-evaluate all algorithms, preprocessing, ensembles Model Refresh • • • • “Minimal retraining” Just run the final 1-3 model trainings on “fresher” data Do not repeat exploring all algorithms and ensembles Assume the ”structure” is a reasonable solution © 2016 LigaData, Inc. All Rights Reserved. | 30 DBC – Dependent By Category (Powerful Preprocessing, Bayesian Priors) Find out top ~10-20 most predictive variables to date Explore interactions in a hierarchy Like mini-OLAP cubes, with target average in each cell Use ”best fit”, most granular, that is significant Use the most granular OLAP cell, WITH A MIN REC CNT or significance If same number dim, use most extreme target value A*B*C*D A*B*C, A*B*D, A*C*D, B*C*D A*B, … C*D Frequently produces 4-6 of Top 10 most predictive variables A, B, C, D © 2016 LigaData, Inc. All Rights Reserved. | 31 DBC Example • Average past Lift per category • • • • Percent off bin (i.e. 0%, 5%, 10%, 15% … 80%) Price Savings Bin (i.e. $2, $4, $6 …) Store hierarchy Product hierarchy (50k to 100k SKUs, 4-6 levels) • • • • • Department, Sub-department, Category, Sub-Catgegory Seasonality, time, month, week Reason codes (the event is a circular, clearance) Location on the page in the flyer (top right, top left..) Multivariate combinations – powerful & scalable 18 DBC – Interactions 1) Pre-calculate a lookup table, with the past avg target for each set of prior conditions 2) Apply by looking up conditions for a given storeitem returning the target estimate 19 DBC – Dependent By Category (Update Tables to help model live longer) Recalculate the cell values weekly or monthly Low computational cost, low effort Capture the ”latest fraud trends” The model weights on the field can remain the same Adapt to 1,000’s of small, incremental changes Without having to Retrain or Refresh the model Can choose to keep pockets of past bad behavior, to recognize in the future “Balance Stability vs. Placticity” © 2016 LigaData, Inc. All Rights Reserved. | 34 Contents Develop a Robust Solution (or get fired) Selecting the Best Model w/ Model Notebook Describing the Model Putting a Model in Production Model Drift over Time (Non-Stationary) Retrain, Refresh or update DBC Preprocessing Kamanja Open Source PMML Scoring Platform © 2016 LigaData, Inc. All Rights Reserved. | 35 Solution Architecture for Threat and Compliance Lambda Architecture with Continuous Decisioning 2 1 3 6 5 4 1 Decisioning is applied to all data immediately upon availability 2 Decisioning leverages all available data, including data stored in other layers 3 Enhancements to the decisioning process are enabled through continuous feedback of data and model updates 5 Standard case management reports and workflow are augmented with advance data visualization, drill-through capabilities, and search 4 Actions may include triggering the start of other processes or sending alerts to a case management system 6 Models may be built and tested using all available data and a variety of tools, then quickly and easily deployed into production © 2016 LigaData, Inc. All Rights Reserved. | 36 Solution Stack for Threat and Compliance Leveraging Primarily Open Source Big Data Technologies © 2016 LigaData, Inc. All Rights Reserved. | 37 Continuous Decisioning Use Case: Cyber Threat Detection & Response Use Kamanja to detect potential cyber security breaches Problem Solution • Ingest IP addresses, malware signatures, hash values, email addresses, etc. in real time • Automatically enrich with third party data • Check historical logs against new threats continuously • Predictive analytics based on machine learning flag suspicious activity before it becomes a problem • Direct integration with dashboards to generate alerts and speed up investigation Diverse Inputs • Structured and unstructured data, with varying latencies Data Enrichment • Long and laborious process, manual and ad hoc Quality of Threat Intelligence • Lots of false positives waste analyst resources Poor Integrations with Response Teams • Manual and Time Consuming Process © 2016 LigaData, Inc. All Rights Reserved. | 38 Continuous Decisioning Use Case: Application Monitoring Use Kamanja to detect insider attacks to sensitive data Problem Solution • • Legacy system is batch oriented • Months required to create and implement new alerts • • Slow speed-to-market developing new source system extracts. Months required to assimilate new data. • • Risks to PII and NPI, with compliance implications. • Use open source big data stack to migrate to real time data streaming, rapid model deployment, and alerts with no manual intervention. Calculate number of times PII/NPI accessed over eight hour period, and calculate risk to generate alerts Machine learning to identify normal pattern of out of office hours access. Trigger automatic alerts when anomalies occur. Rapid implementation of new models to deal with emerging threats. © 2016 LigaData, Inc. All Rights Reserved. | 39 Continuous Decisioning Use Case: Unauthorized Trading Detection Use Kamanja to reduce the risk of rogue behavior at an investment bank Problem • • • Need timely alerting of potentially unauthorized trading activity Must tie together voluminous data, reports, and risk measures Meet increasingly stringent time requirements Solution • Create a Trader Surveillance Dashboard • Provide a holistic view of a trader, based on all relevant information about the trader, the marketplace, and peers • Build supervised and unsupervised machine learning models based on operational, transactional, and financial data. • Real-time analysis and monitoring of trader activity automatically highlights unusual activity and triggers alerts on trades to investigate © 2016 LigaData, Inc. All Rights Reserved. | 40 Continuous Decisioning Use Case: Credit Card Fraud Detection Use Kamanja to incrementally reduce fraud losses by applying multiple predictive models for transaction authorization Problem • • • • Solution • Apply Kamanja to IVR, web, and transactional data to trigger alerts • Initial models detect suspicious web traffic, common purchase points, and application rarity • Leverage existing infrastructure as well as existing third party systems (Falcon and TSYS) • Reduce costs by 80% with open source software $16.3 billion in credit card fraud losses annually Fraud is growing more quickly than transaction value New types of fraud are one step ahead of existing solutions Dependence on third party proprietary systems means slow reaction times and expensive changes © 2016 LigaData, Inc. All Rights Reserved. | 41 Thank You Wed, June 22, 2016 [email protected] www.Linkedin.com/in/GregMakowski www.Kamanja.org (Apache open source licensed) © 2015 LigaData, Inc. All Rights Reserved.