Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
The CRISP-DM Model By John Mehall Phases of CRISP-DM 1. Business Understanding 2. Data Understanding 3. Data Preparation 4. Modeling 5. Evaluation 6. Deployment Phase One: Business Understanding Convert project objectives to data mining problems Need to understand the business to pick your data Business understanding involves several steps 1. Determine business objectives – Need to figure out exactly what a client wants – Ask questions 2. Assess situation – The analyst should outline all of the different resources that are available. 3. Determine data mining goals – Project objectives are stated in business terms. 4. Produce project plan – The intended plan and timeline for achieving data mining goals. Phase Two: Data Understanding This phase involves four different steps to explore and understand the data 1. Collect the initial data - Analyst describes “surface” data – what can be seen from a 10,000 foot view. 2. Describe the data - The analyst examines the surface data and describes the results, reporting on issues such as the format. 3. Explore the data - Here the analyst drills into the data to tackle questions that can be addressed using querying, visualization, and reporting. The data analyst should then create a data exploration report that outlines findings, and the potential impact on the remainder of the project. 4. Verify data quality - Here the analyst checks the quality of the data: is the data considered complete? Checking for missing attributes and blank fields, checking if all possible values are represented. Phase Three: Data Preparation Covers all activities to construct the final data set. Select data - Involves looking at several criteria and explaining why certain data were included or excluded. Clean data - Selecting clean subsets of data or estimating the amount of missing data through modeling. Construct data - Transforming the data into new forms, using formulas that will create better attributes than the current data. Integrate data - Involves bringing together data from multiple tables to create new records. With table based data, an analyst can join tables. Format data - In some cases, the analyst changes the format to make the data suitable for a specific modeling tool, or to pose necessary questions. Phase Four: Modeling Various modeling techniques can be selected for optimal results The breakdown of steps for the selection of modeling technique include: 1. Select the model technique - Refers to the choosing one or more modeling techniques, (any assumptions should be recorded). 2. Generate test design - The analyst develops an analytical model based on one set of existing data, and tests it’s validity on a different set. 3. Build the model - After testing, the data analyst runs the modeling tool on the prepared data set to create one or more models. 4. Assess the model - The analyst interprets the results according to the knowledge they possess in their domain, and works with a business analyst to interpret the business context. Phase Five: Evaluation Evaluate results - Assesses the degree to which the model meets the business objectives. Another option is testing the model on real-world applications. Review process - A more thorough review of the data mining process to determine if any factors were missed. Also covers quality assurance issues. Determine next steps - The project leader decides to finish this project and move to deployment or initiate further iterations. Phase Six: Deployment It is often the customer that will carry out the deployment phase Plan deployment – Takes the evaluation results and develops a strategy for deployment. Plan monitoring and maintenance - A carefully prepared maintenance strategy avoids incorrect usage of data mining results. Produce final report - Could be a summary or comprehensive presentation of results. Includes previous deliverables and summarizes results. Review project - Assess failures and successes, as well as areas to improve in the future. Should also include a summary of important experiences. Conclusion • CRISP-DM was not built in a theoretical, academic manner. • Is based on practical, real-world data mining experience. • Not meant to be a magical book for novices to succeed. • With data mining training and assistance from experienced analysts, CRISP-DM can be a valuable tool.