Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Data Mining and Knowledge Discovery for Strategic Business Optimization Peter van der Putten ALP Group, LIACS & KiQ Ltd November 2004 Why is a business in business? • Successful businesses create a lot of added value for their customers and capture it – Maximize long term profit • Optimize: Maximize sales, minimize costs, minimize risk Challenges • Businesses are bigger • Fragmentation of products, customer interaction channels, market segments • Fierce competition, chaotic economic climate and dynamic customer behavior • Data glut & information overflow • Solution: data mining & knowledge discovery for strategic business optimization Credit scoring case: minimizing loan risk while maximizing loan acception All applications Expert knowledge 29.8% accepted 12.7% infection Prediction model plus rules 34.5% accepted 9.1% infection Accepted volume Marketing case: maximizing direct mail response while minimizing cost 100.00 90.00 80.00 Cum. positive 70.00 A model was created that predicts the probability to respond to a mailing. By using the model to select customers to mail we could reach 50% of the responders by mailing only 20% of all customers 60.00 50.00 40.00 30.00 20.00 10.00 0.00 0 10 20 30 40 50 Cases (%) Logistic-Regression 60 70 80 90 100 Siebel predicts a slight Although the next Within OMEGA customer general offers might insurance, Siebel have the forfor general preferences appropriate aspreference well, OMEGA theagain text predicts exit risk itsis ascript OMEGA offers Siebel insurance and offers a overriding.preference Usingthe a combination engine. for car insurance ofoneappropriate text to click cross-sell button. predictive models and offers and business one-click rules, access execute a retention script. OMEGA suggests to to theSiebel appropriate an immediate script. attempt to retain the customer. Overview • • • • • • Why Data Mining? The Data Mining Process Data Mining Tasks Data Mining Techniques Future Outlook Data Mining Opportunities by Sector and Function • Q&A Some working definitions…. • ‘Data Mining’ and ‘Knowledge Discovery in Databases’ (KDD) are used interchangeably • Data mining = – the discovery of interesting, meaningful and actionable patterns hidden in large amounts of data • Multidisciplinary field originating from artificial intelligence, pattern recognition, statistics, machine learning, econometrics, …. Data mining is a process… • Model Development – – – – – Objective Data collection & preparation Model construction Model evaluation Combining models with business knowledge into decision logic • Model / decision logic deployment • Model / decision logic monitoring Data mining tasks • Undirected, explorative, descriptive, ‘unsupervised’ data mining – Matching & search – Profile & rule extraction – Clustering & segmentation • Directed, predictive, ‘supervised’ data mining – Predictive modeling Data mining task example: Clustering & segmentation Data mining task example: Clustering & segmentation Start Looking Glass Tussenresultaat looking glass Resultaat Looking Glass Resultaat Looking Glass Data mining task example: predictive modeling Past experience Data Behaviour Good Bad Bad Case A Good Case B Score Model Case A 7 Case B 4 10 9 8 7 6 5 4 3 2 1 Better business Worse business Data mining task example: predictive modeling Income Age Children 60K 38 2 30K 23 1 30K 29 0 ... ... ... 120K 55 2 Collected data Data mining task example: predictive modeling Income Age Children Status 60K 38 2 Good 30K 23 1 Good 30K 29 0 Bad ... ... ... ... 120K 55 2 Bad Known customer behaviour Data mining task example: predictive modeling Income Age Children Status Value Score 60K 38 2 Good 100 12 30K 23 1 Good 45 2 30K 29 0 Bad -80 -24 ... ... ... ... ... ... 120K 55 2 Bad -40 -5 score = (0 x Income) + (-1 x Age) + (25 x Children) Data mining task example: predictive modeling • Recruitment – Who will respond to a mailing campaign? – To who can we cross sell which products? – What will be the customer value one year from now? • Retention – Who is going to cancel his/her mobile phone subscription. Should I attempt to keep this customer? – Which customers have accounts that will go dormant? • Risk – Should I sell a loan to this person? – How much money will someone claim on a policy? – Is this caller going to pay his bills? Data mining techniques for predictive modeling • • • • • Linear and logistic regression Decision trees Neural Networks Genetic Algorithms …. Linear Regression Models score = (0 x Income) + (-1 x Age) + (25 x Children) Regression in pattern space Only a single line available in pattern space to separate classes income Class ‘square’ Class ‘circle’ age Decision Trees 20000 customers response 1% Income >150000? yes no 1200 customers balance>50000? yes 400 customers response 0,1% 18800 customers Purchases >10? no 800 customers response 1,8% no etc. Decision Trees in Pattern Space Line pieces perpendicular to axes income Each line is a split in the tree, two answers to a question age Infotrees (Genetic Programming) • Nested regression formulas – sum(average(region, spend), max(age, children)) sum max average region spend age children Infotrees in Pattern Space Infotrees can seperate any class in pattern space, even if the class boundary is non-linear income Can model complex customer behavior age Genetic Algorithms / Programming • How to find the best Infotree? Genetic algorithms – Based on the idea of evolution – Start with (random) Infotrees – Build a new generation • Fittest models can reproduce to create offspring, worst models die • Small amount of mutation occurs to keep exploring – Repeat process Notes about Infotree models: Cross-over •New models can be created by cross-over: – part of one model is swapped with part of another – parts may chosen randomly or intelligently cross-over point old model s1 amean region quadv age spend new model convex concave amean cross-over point old model region convex invert salary concave age children spend age children Notes about Infotree models: Mutation • New models can be created by mutation: – part of a model (a sub-tree, operator or predictor) is changed – part and type of change may chosen randomly or intelligently convex convex Sub-tree region spend Operator age children house TV Region convex concave spend age children region spend concave spend age children convex amean region children concave s2 convex Predictor age convex amean region concave s2 concave amean age children concave s2 house spend age children Short Demo (if time allows…) Model to predict caravan policy ownership Combining this model with other models and business rules Data Mining: the Future • Business (marketing) – More fine-grained segmentation down to the cluster or individual level – More personalised actions, inbound and outbound, in all customer contact channels – Optimization of both value for the business and the customer – Privacy • Technical – From Data Mining to Decisioning, combining multiple models with business rules – Monitoring business and model performance – Data Mining Process Automation Let’s discuss: Data Mining Opportunities by Function • • • • • • • • • Marketing, Sales, CRM Product Development, R&D Manufacturing, Production, Logistics Customer service Finance Procurement Human Resources IT …. Let’s discuss: Data Mining Opportunities by Sector • • • • • • • • • Retail Telco Pharma Government Automotive Oil Charity Consumers / Citizens …. The Paper: Requirements • 2500 words + -10%, APA style references • No plagiarism / copying! Rephrase in your own words, reference, cite & quote • Two parts of each 1250 words – Your grasp of the research topic: what is data mining? Own interpretation, clear, put into context – Memo to CEO/CIO of a specific company / industry: what are the benefits/changes/opportunities and next steps (best practice, proof of concept)? Impact, convincing, plan to action. The Paper: Suggestions • Suggestions for ‘companies’ – KPN Mobile, Marketing: how to reduce loss of customers to competitors – Dutch Police, Strategic Innovation: opportunities for law enforcement, privacy implications – Pfizer, Drug Discovery: using data mining to find new drugs – Google, Product Management / R&D: opportunities for new data mining features to enhace customer experience – Your Idea! The Paper: Resources • Webpage for this talk: – http://www.liacs.nl/~putten/ictvision.html • General Writing Resources: – http://www.liacs.nl/~putten/writingpapers.html • Homepage: – www.liacs.nl/~putten , mail [email protected] Dilbert’s Perspective on Data Mining