Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
October 2-3, 2015, İSTANBUL Boğaziçi University Project Management in Data Mining Prof.Dr. M.Erdal Balaban Istanbul University Faculty of Business Administration Avcılar, Istanbul - TURKEY PRESENTATION OUTLINE What is Data Mining? Data Mining Environment Decision Making Process CRISP-DM Methodology Phases of Data Mining Process Flowchart of Data Mining Process (Proposal) Conclusions October 2, 2015 2/17 What is Data Mining? “Data mining is the process of discovering useful patterns and trends in large data sets.” (Larose, 2014). Data mining makes the difference which are used in many areas: health care, banking, finance, insurance, telecommunications, manufacturing, retail, market research, and the public sector. October 2, 2015 3/17 Data Mining Environment Database Technology Other Disciplines Statistics Data Mining Information Science Database Technology October 2, 2015 Database Technology Visualizations Machine Learning 4/17 Decision Making Process DATA October 2, 2015 INFORMATION KNOWLEDGE DECISIONS ACTION 5/17 CRISP-DM Methodology CRoss-Industry Standard Process for Data Mining (Shearer, 2000) CRISP-DM focuses data mining on rapid model development and deployment to optimize decisions. October 2, 2015 6/17 CRISP-DM The Cross-Industry Standard Process for Data Mining (CRISP-DM) is the dominant data-mining process framework. It's an open standard; anyone may use it. The following list describes the various phases of the process. October 2, 2015 7/17 Business Understanding Determine Business Objectives Background Business Objectives Business Success Criteria Assess Situation Inventory of Resources Requirements, Assumptions, and Constraints Risks and Contingencies Data Preparation Data Understanding Collect Initial Data Initial Data Collection Report Describe Data Data Description Report Explore Data Data Exploration Report Verify Data Quality Data Quality Report Terminology Costs and Benefits Determine Data Mining Goals Data Mining Goals Data Mining Success Criteria Data Set Data Set Description Select Data Rationale for Inclusion/Exclusion Clean Data Data Cleaning Report Construct Data Modeling Select Modeling Technique Evaluate Results Mining Results w.r.t. Business Success Criteria Modeling Assumptions Approved Models Test Design Build Model Review Process Review of Process Determine Next Steps Parameter Settings List of Possible Actions Derived Attributes Models Decision Generated Reports Model Description qIntegrate Data Merged Data qFormat Data Plan Deployment Assessment of Data Modeling Technique Generate Test Design Deployment Evaluation Deployment Plan Plan Monitoring and Maintenance Monitoring and Maintenance Plan Produce Final Report Final Report Final Presentation Review Project Experience Documantation Assess Model Model Assessment Revised Parameter Settings Reformatted Data Produce Project Plan Project Plan Initial Assesment of Tools and Techniques Tasks (bold) and outputs (italic) of the CRISP-DM reference model October 2, 2015 8/17 Data Mining Phases (Proposal Flowchart) Define Project Crucial Phase ! Data Preparation Data Sources Data Understanding & Data Selection Data Gathering Clustering Methods or Association Rules No Classification Methods Dataset Supervised Learning ? Crucial Phase ! Yes Test Dataset Training Dataset Selecting Algorithm & Model Building Evaluation of Model Performance Measuring Model Performance Low October 2, 2015 Data Preprocessing Evaluate Model High Model Implementation Knowledge Representation & Decision Planing for data mining project Produce project plan: List the stages in the project, together with duration, resources required, and relations. Define the project Prepare data for data mining modeling Separate data into training and testing parts for performance evaluation Apply alternative algorithms to build model and evaluate the model’s performances Implement the model to generate knowledge and make a decision before action October 2, 2015 10/17 Define project Understand the project objectives and requirements on the first phase of data mining List the assumptions made by the project and list the constrains on the project Construct a cost-benefit analysis for the project October 2, 2015 11/17 Prepare data for data mining Collect the data (or datasets), Select data, Explore data, Clean the data, Reformat data, Transform data. October 2, 2015 12/17 Separate the dataset for performance evaluation Select the evaluation method Hold-out Cross validation (k-fold cv) Bootstrapping October 2, 2015 13/17 Apply alternative algorithms and select the best model There are several techniques for the same data mining problem type. Some techniques have specific requirements on the form of data. Classification algorithms k Nearest Neigbour (kNN) Naive Bayes Logistic Regression Decision Trees Support Vector Machines Artificial Neural Networks –ANNs Clustering Algorithms Assocation Algorithms The generated models that meet the selected criteria become approved models. October 2, 2015 14/17 Implement the model to make a decision Creation of the model is generally not the end of the project. Even if the purpose of the model is to increase knowledge of the data. Apply the model within the organization’s decision making process and then activate. October 2, 2015 15/17 CONCLUSIONS 1. 2. 3. 4. Data Mining Techniques are important to discover knowledge which is more meaningful and valuable for decision making. Project management approach is important for succeessful data mining. Each phase of data mining process is important but most important phases are data preparation before modeling and evaluation of model performance after modeling. These crucial phases are usually disregarded or skipped in practice. All phases and sub operations should be planned and scheduled by using project management methods for successful data mining. October 2, 2015 16/17 Thank you very much for your attention and listenning. Are there any questions and suggestions? [email protected] October 2, 2015 17