Download The CRISP-DM Model

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Nonlinear dimensionality reduction wikipedia , lookup

Transcript
The CRISP-DM Model
By John Mehall
Phases of CRISP-DM
1.
Business Understanding
2.
Data Understanding
3.
Data Preparation
4.
Modeling
5.
Evaluation
6.
Deployment
Phase One: Business Understanding

Convert project objectives to data mining problems

Need to understand the business to pick your data

Business understanding involves several steps
1.
Determine business objectives – Need to figure out exactly what a client wants –
Ask questions
2.
Assess situation – The analyst should outline all of the different resources that are
available.
3.
Determine data mining goals – Project objectives are stated in business terms.
4.
Produce project plan – The intended plan and timeline for achieving data mining
goals.
Phase Two: Data Understanding

This phase involves four different steps to explore and understand the data
1.
Collect the initial data - Analyst describes “surface” data – what can be seen from a 10,000
foot view.
2.
Describe the data - The analyst examines the surface data and describes the results, reporting
on issues such as the format.
3.
Explore the data - Here the analyst drills into the data to tackle questions that can be
addressed using querying, visualization, and reporting. The data analyst should then create a
data exploration report that outlines findings, and the potential impact on the remainder of
the project.
4.
Verify data quality - Here the analyst checks the quality of the data: is the data considered
complete? Checking for missing attributes and blank fields, checking if all possible values are
represented.
Phase Three: Data Preparation

Covers all activities to construct the final data set.

Select data - Involves looking at several criteria and explaining why certain data were included or excluded.

Clean data - Selecting clean subsets of data or estimating the amount of missing data through modeling.

Construct data - Transforming the data into new forms, using formulas that will create better attributes than
the current data.

Integrate data - Involves bringing together data from multiple tables to create new records. With table based
data, an analyst can join tables.

Format data - In some cases, the analyst changes the format to make the data suitable for a specific modeling
tool, or to pose necessary questions.
Phase Four: Modeling

Various modeling techniques can be selected for optimal results

The breakdown of steps for the selection of modeling technique include:
1.
Select the model technique - Refers to the choosing one or more modeling techniques, (any assumptions
should be recorded).
2.
Generate test design - The analyst develops an analytical model based on one set of existing data, and tests
it’s validity on a different set.
3.
Build the model - After testing, the data analyst runs the modeling tool on the prepared data set to create one
or more models.
4.
Assess the model - The analyst interprets the results according to the knowledge they possess in their domain,
and works with a business analyst to interpret the business context.
Phase Five: Evaluation

Evaluate results - Assesses the degree to which the model
meets the business objectives. Another option is testing the
model on real-world applications.

Review process - A more thorough review of the data mining
process to determine if any factors were missed. Also covers
quality assurance issues.

Determine next steps - The project leader decides to finish this
project and move to deployment or initiate further iterations.
Phase Six: Deployment

It is often the customer that will carry out the deployment phase

Plan deployment – Takes the evaluation results and develops a strategy for deployment.

Plan monitoring and maintenance - A carefully prepared maintenance strategy avoids
incorrect usage of data mining results.

Produce final report - Could be a summary or comprehensive presentation of results.
Includes previous deliverables and summarizes results.

Review project - Assess failures and successes, as well as areas to improve in the future.
Should also include a summary of important experiences.
Conclusion
•
CRISP-DM was not built in a theoretical, academic manner.
•
Is based on practical, real-world data mining experience.
•
Not meant to be a magical book for novices to succeed.
•
With data mining training and assistance from experienced
analysts, CRISP-DM can be a valuable tool.