Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
International Journal of Computer Science Engineering and Information Technology Research (IJCSEITR) ISSN(P): 2249-6831; ISSN(E): 2249-7943 Vol. 6, Issue 5, Oct 2016, 75-84 © TJPRC Pvt. Ltd. ANALYSIS AND ENHANCEMENT OF PROCESS MODEL USING SCORING FOR CUSTOMER RELATIONSHIP MANAGEMENT REKHA ARUN1 & JEBAMALAR TAMILSELVI2 1 2 Research Scholar, Sathyabama University, Tamil Nadu, India Assistant Professor, Jaya Engineering College, Tamil Nadu, India ABSTRACT A potential and valuable customer is identified only through the 360 degree complete analysis. The identification process uses various business models in CRM. A number of researchers had made efforts to use such process models to direct them to implement in mining large amount of data. This paper mainly focuses on the comparative analysis of most popular data mining process models viz., Knowledge Discovery Databases (KDD) process model, CRISP-DM and SEMMA as well as enhancement of CRISP-DM in its modeling technique. This comparative study shows that the KDD and SEMMA are almost similar and CRISP-DM is best suited to the business analysis which is related to the identification of potential customer in CRM, the major objective of this paper. Also the investigation revealed that the inclusion of scoring model in the modeling phase of CRISP-DM provides optimum result in identifying the potential customer through the process models. Received: Sep 13, 2016; Accepted: Oct 07, 2016; Published: Oct 13, 2016; Paper Id.: IJCSEITROCT20169 1. INTRODUCTION Data mining is an innovative process that needs various skill and knowledge. With available standard Original Article KEYWORDS: Process Model, KDD, SEMMA, CRISP-DM, Scoring Model, CRM models, different data projects are carried out. It is interpreted that the success of the project depends on the process model used. These models are used to translate the business challenges into various data mining tasks, recommend appropriate data transformation and data mining technique, and give method for assessing the efficiency of the result and prepare the document of the learning. Acceptance of common process model in the market provides more benefits where the model serves as a general reference point for discussing and thus increases the understanding of vital data mining challenges for pointing out the potential customer. The familiar models are KDD, SEMMA, CRISP-DM. Data mining is one of the phase of KDD process (Fayyad et al., 1996) and in (Brachman & Anand, 1996). The Phrase knowledge discovery in database or KDD was termed in 1989 which refers to the extended process of identifying information from data, and to highlight the high end application of specific datamining method (Fayyad et al, 1996). SEMMA was developed by the SAS Institute. The acronym SEMMA stands for Sample, Explore, Modify, Model, Assess, and refers to the process of conducting a data mining project. SEMMA is simple to understand, allows a structured and sufficient development and maintaining of data mining project. Thus it conferred an organization for conception, creation and evolution, and helps to present solution to business problem as well as to identify the CRM goals. (Santos & Azevedo, 2005). www.tjprc.org [email protected] 76 Rekha Arun & Jebamalar Tamilselvi The process of CRISP-DM was generated by the effort of an association composed of Daimler Chryrler, SPSS and NCR. CRISP-DM stands for CRoss-Industry Standard Process for DataMining (Chapmen et al, 2000). At the time of analysing the documentation in these process models, the similarities and dissimilarities of them are understood. This paper also deals with the enhancement of CRISP-DM with including scoring model in the modelling phase. Scoring model is a predictive system that is used for assessing the credit worthiness, optimization of direct marketing and models used in CRM that allows the predicting of future behaviour of customers. The scoring model is delivered as score table containing the scores of customers with respect to various parameters. The remaining part of this paper is organized as section 2- comparative study of existing process models, section 3- Enhancement of CRISP-DM with scoring model, section 4- Results and discussion section, 5- Conclusion and Future work. 2. COMPARATIVE STUDY OF EXISTING MODELS 2.1 The KDD Process Fayyad et al (1996) presents that KDD is a process that uses data mining methods for extracting the knowledge based on the specific measure and threshold with a database with required stages of pre-process, sub-sample and database transformation. It has five stages: 1. Selection: This stage performs the creation of target data set, or focus on a variable subset or data sample on which the knowledge discovery is to be done. 2. Pre Processing: This stage is responsible for the cleaning of target data and pre processing to get the consistent data. 3. Transformation: This stage transforms data using dimensionality reduction or transformation methods. 4. Data Mining: This stage searches the pattern of interest in a specific denoted format based on the objective. 5. Interpretation/Evaluation: This stage interprets and evaluates the mined pattern. 2.2 The SEMMA Process The SEMMA process was developed by the SAS Institute. The acronym SEMMA stands for Sample, Explore, Modify, Model, Assess, and refers to the process of conducting a data mining project. The SAS Institute considers a cycle with 5 stages for the process: 1. Sample: This stage deals with sampling the data by the extraction of a part of large data set that holds important information but could be manipulated quickly and this stage is considered optional. 2. Explore: This stage explores the data by searching for unanticipated trend and anomaly to understand and gain ideas. 3. Modify: This stage modifies the data by creating, selecting, and transforming the variable to focus the model selection process. 4. Model: This stage models the data by facilitating the software to search a combination of data that predicts a desired result optimally. Impact Factor (JCC): 7.1293 NAAS Rating: 3.63 Analysis and Enhancement of Process Model Using Scoring for Customer Relationship Management 77 5. Assess: This stage evaluates the data by assessing the worth and consistency of the finding from the process of data mining process and its performance. Even though SEMMA process is not dependent on the selected tool, it is associated with the SAS Enterprise Miner software and acts as if guides the users on the implementation of DM application. SEMMA offer a simple to understand process that allows unstructured and sufficient development and maintaining of data mining project. 2.3 The CRISP-DM Process The CRISP-DM process was designed by the group that included DaimlerChryrler, SPSS and NCR. CRISP-DM stands for CRoss-Industry Standard Process for DataMining. It consists on a cycle that comprises six stages: 1. Business Understanding: This first stage focusing in the understanding of objectives of the project and needs from the business view. Later converting the knowledge in to data mining problems and initial plan developed for the achievement of the objective. 2. Data Understanding: This phase commences with the initial data set and further proceeds with the actions that make the data familiar and identifies the data quality issues, discovers the first view of data or find required subset to form hypothesis on hidden information. 3. Data Preparation: This includes entire actions to build the final data set from the initial rough set. 4. Modeling: In this stage, different modelling technique is chosen and implemented with their calibrated parameters to best values. 5. Evaluation: This stage evaluates the model as well as the steps included in constructing the model. It achieves the exact objective of business. 6. Deployment: Creating a model is not the end. Its purpose is to increase the knowledge gain, and to present it in the user friendly manner. (Chapman et al, 2000) 2.4 Comparison With the comparison of KDD and SEMMA stagesit confirms the equivalency between them: Sample is similar to Selection; Explore is similar to Preprocessing; Modify is similar to Transformation; Model is similar to DM; Assess is similar to Interpretation/Evaluation. By thorough investigation, it is observed that the entire five stages of SEMMA process are similar to the practical implementation of all the five phases of KDD process. At the same time, when compared to KDD stages the CRISP-DM stages are not as straightforward as in the SEMMA environment. But it is observed that the CRISP-DM methodology includes the steps given above; either precedes or succeeds the KDD process. The Business Understanding phase is deals with the development of an understanding of the application domain related to the previous knowledge and goal of the final user. The Deployment phase incorporates this knowledge to the working system. While considering the other stages, it is www.tjprc.org [email protected] 78 Rekha Arun & Jebamalar Tamilselvi said that: The Data Understanding phase is the blend of Selection and Pre processing; The Data Preparation phase is related to Transformation; The Modeling phase is compared with DM and finally the Evaluation phase with Interpretation/Evaluation. Table 1, presents a summary of the correspondence: Table 1: Summary of the Correspondences between KDD, SEMMA and CRISP-DM With previous researches it is observed that the data mining experts follow the KDD process model due to its completeness and accurateness. In contra, CRISP-DM and SEMMA are highly company oriented. In specific, SEMMA is used by SAS enterprise miner and integrate with their software. However, studies prove that CRISP-DM is more complete when compared to SEMMA. These process models help the users and experts to understand the application of data mining in the practical environment. The CRISP-DM process was developed as a process which is industry oriented and tool-neutral. From the embryonic knowledge discovery process implemented in the early data mining projects which responded directly to user requirement, this model can be applied to various industry sector. This model works on larger data, with fastness, cheaper, consistent and more manageable. Not only larger data, even the small level data mining exploration benefits of using CRISP-DM. 3. ENHANCEMENT OF CRISP-DM WITH SCORING MODEL The Steps in CRISP-DM are The CRISP-DM model is given described in terms of a hierarchical process model that contains a collection of tasks explained at several levels of abstraction. Impact Factor (JCC): 7.1293 NAAS Rating: 3.63 Analysis and Enhancement of Process Model Using Scoring for Customer Relationship Management 79 Figure 1: Shows the Phase of the CRISP-DM Among the steps of CRISP-DM, most of the companies apply various statistical models in the modeling phase for optimizing their activities. With the various modeling technique existing, scoring model is used for assessing credit worthiness, bad debt collection activities and optimizing direct marketing in CRM. This model is a special kind of predictive model because it allows predicting the future behavior of clients and in turn the potential customer. Predictive model predicts chance of occurring of an arbitrary event or fact of its occurrence, for example: default on loan payment, an accident, client agitation or attrition, or being a good. Decisions supported with the help of scoring model when compared to the general rules shows the increase in profit by 10-30%. The examples of scoring models are as follows: Credit Risk Forecasting of credit risk of a customer before granting a loan (application scoring) Forecasting of risk for a loan already granted to a customer (behavioral scoring) Detecting fraud / unusual transactions (fraud detection) Forecasting of mailing campaign answering (response scoring) Selecting optimal bad debt collection actions Whether a client is using all the products bought? (activation scoring) Extension of usage of product bought? (usage scoring) Whether the customer buys a product along with some other product? (cross-selling) www.tjprc.org [email protected] 80 Rekha Arun & Jebamalar Tamilselvi Whether a customer buys a product requested earlier (e.g. will decide to have higher credit limit)? (up-selling) Using a product less (attrition scoring) Stopping using a product jointly with starting using another product — it is a problem often occuring in telecoms (churn) It is vital in situations where only small data set is available. For instance, this happens while constructing a model to assess the credit worthiness to verify the customers who apply of the mortgage loan where the sample is smaller when compared to cash or retail loan. With the less data more significant methods are to be selected for building the model. In case, where data are extensively gig, then optimal choice of method and knowledge in analyzing the data plays a key role and is a major factor to success. Best suited method allows evaluating the uncertainty that causes reduction in risk. Implementation of best model directly increases the profit and competitiveness. This is especially important during economical recession. Pseudo code for redit scoring the calculation done to find the potential customers Impact Factor (JCC): 7.1293 NAAS Rating: 3.63 Analysis and Enhancement of Process Model Using Scoring for Customer Relationship Management 81 Data Set Used: The German Credit data set contains observations on 30 variables for 1000 past applicants for credit. Each applicant was rated as “good credit” (700 cases) or “bad credit” (300 cases). New applicants for credit can also be evaluated on these 30 "predictor" variables. 4. RESULTS AND DISCUSSIONS With this data set the credit scoring rule is generated for determining whether a new applicant is a good credit risk or a bad credit risk, depending on the values for one or more of the predictor variables. All the variables are explained in Table 1.1 www.tjprc.org [email protected] 82 Rekha Arun & Jebamalar Tamilselvi Table 1.2, below, shows the values of these variables for the first several records in the case. Table 1.2: The Data (First Several Rows) The consequences of misclassification have been assessed as follows: the costs of a false positive (incorrectly saying an applicant is a good credit risk) outweigh the cost of a false negative (incorrectly saying an applicant is a bad credit risk) by a factor of five. This can be summarized in the following table. Table 1.3: Opportunity Cost Table (In deutch Marks) The Opportunity Cost table was derived from the average net profit per loan as shown below: Table 1.4: Average Net Profit Impact Factor (JCC): 7.1293 NAAS Rating: 3.63 Analysis and Enhancement of Process Model Using Scoring for Customer Relationship Management 83 Useful graphs include the lift chart, Kolmogorov Smirnov chart, and other ways to assess the performance of the scoring model. For example, the following graph shows the Kolmogorov Smirnov (KS) graph for a credit scoring model. Figure 2 In this graph, the X axis shows the credit score values (sums), and the Y axis denotes the cumulative proportions of observations in each outcome class (Good Credit vs. Bad Credit) in the hold-out sample. The further apart are the two lines, the greater is the degree of differentiation between the Good Credit and Bad Credit cases in the hold-out sample, and thus, the better (more accurate) is the model. 5. CONCLUSIONS AND FUTURE WORK With the objective to find the potential customer, this paper focused on the process models to enhance the data mining. Three different process models viz., KDD, SEMMA and CRISP-DM are compared with their performance. It is concluded that the CRISP-DM is best suited for business analysis. It is determined to enhance the CRISP-DM in its modeling phase with the predictive model called scoring model. The scoring model enables the CRM to identify the potential customers through the credit risk by distinguishing good credit and bad credit. Objective of scoring model is not only to determine the credit worthiness of the customer but also to maintain customer relationship management (CRM) to retain the customer and maintain the overall profit portfolio. CRISP-DM with scoring model shows the optimization of the result obtained. Further the evaluated can be classified with an enhanced meta-heuristic algorithm to help CRM for deciding the potential customer. www.tjprc.org [email protected] 84 Rekha Arun & Jebamalar Tamilselvi REFERENCES 1. Fayyad, U. M. et al. 1996. From data mining to knowledge discovery: an overview. In Fayyad, U. M.et al (Eds.),Advances in knowledge discovery and data mining. AAAI Press / The MIT Press. 2. Benoît, G., 2002. Data Mining. Annual Review of Information Science and Technology, Vol. 36, No. 1, pp 265-310. 3. Brachman, R. J. & Anand, T., 1996. The process of knowledge discovery in databases. In Fayyad, U. M. et al. (Eds.), Advances in knowledge discovery and data mining. AAAI Press / The MIT Press. 4. Chen, M. et al, 1996. Data Mining: An Overview from a Database Perspective. IEEE Transactions on Knowledge andData Engineering, Vol. 8, No. 6, pp 866-883. 5. Simoudis, E., 1996. Reality check for data mining. IEEE Expert, Vol. 11, No. 5, pp 26-33. 6. Fayyad, U. M., 1996. Data mining and knowledge discovery: making sense out of data. IEEE Expert, Vol. 11 No. 5, pp20-25. 7. Dzeroski, S., 2006. Towards a General Framework for Data Mining.. In Dzeroski, S and Struyf, J (Eds.), Knowledge Discovery in Inductive Databases. LNCS 47474. Springer-Verlag. 8. Meo, R. e tal, 1998. An Extension to SQL for Mining Association Rules. Data Mining and Knowledge Discovery Vol. 2,pp 195-224. Kluwer Academic Publishers. 9. Imielinski, T.; Virmani, A., 1999. MSQL: A Query Language for Database Mining. Data Mining and Knowledge Discovery Vol. 3, pp 373-408. Kluwer Academic Publishers. 10. Sarawagi, S. et al, 2000. Integrating Association Rule Mining with Relational Database Systems: Alternatives andImplications. Data Mining and Knowledge Discovery, Vol. 4, pp 89–125. 11. Botta, Marco, et al, 2004. Query Languages Supporting Descriptive Rule Mining: A Comparative Study. Database Support for Data Mining Applications. LNAI 2682, pp 24-51. 12. SAS Enterprise Miner – SEMMA. SAS Institute. 13. Accessed from http://www.sas.com/technologies/analytics/datamining/miner/semma.html, on May 2008 14. Santos, M &Azevedo, C (2005). Data Mining – Descoberta de Conhecimentoem Bases de Dados. FCA Publisher. 15. Chapman, P. et al, 2000. CRISP-DM 1.0 - Step-by-step data mining guide. 16. Accessed from http://www.crisp-dm.org/CRISPWP-0800.pdf on May 2008 Impact Factor (JCC): 7.1293 NAAS Rating: 3.63