Classification as data mining tool Done by William Hellela Rauf Gadar Alex Prewett Definition Classification is the process of dividing a dataset into mutually exclusive groups so that the members of each group are as "close" as possible to one another, and different groups are as "far" as possible from one another, where distance is measured with respect to specific variable (s) we are trying to predict. Business Examples A typical classification problem is to divide a database of companies into groups that are as homogeneous as possible with respect to a creditworthiness variable with values "Good" and "Bad“ Predicting the response to direct marketing campaign [will/ will not respond] Classification vs. Regression What distinguishes classification from regression is the type of output that is predicted. Classification, as the name implies, predicts class membership. For example, a classification model predicts that a potential customer will respond to an offer. Classification vs. Regression (cont) However, regression model predicts a specific value. For example, a model predicts that a certain customer profitability will be $854 Predictive vs. Descriptive Models Predictive: forecast explicit values, based on patterns determined from known results. Descriptive: describe the patterns in existing data, and are generally used to create meaningful subgroups. Data Mining Models Classification VS Other tools Classification is used to answer questions form a finite set of classes Estimation is used to answer questions from an unknown, continuous set of answers Clustering also does not require a finite set of predefined classes Prediction is a task of learning a pattern from examples using a developed model. Questions or comments Any Questions ?