Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
International Journal of Service Computing and Computational Intelligence (IJSCCI) ISSN: 2162 – 514X, Volume-1, Jan-2014 Identification of a Potential Customer of Business Interest Using Data Mining and Data Warehousing Techniques R.Anusha1 and N.Krishnan2 Abstract— Data mining and Data processing plays a vital role in day to day activities that is a boon to the business people, due to their importance. Nowadays, data mining and data processing are omnipresent in all the fields. The fields such as banking, medicine, education, finance organization, and commodity marketing are using data mining and data processing concepts. A new methodology is proposed for commodity marketers to identify potential customers of business interest, who like which products and which product is highly saleable. These are automatically stored and easily identified. The data mining and preprocessing techniques are used to increase accuracy. Index Terms— Business Intelligence, Data Mining, Valued customer management. I. INTRODUCTION The definition of the goals will benefit from close cooperation between experts in the field of application and data mining analysts, if it is possible to define the problem and the goals of the investigation as the analysis of past data and identification of a model so as to express the propensity of customers to leave the service (churn) based on their characteristics, in order to understand the reasons for such disloyalty and predict the probability of churn and to transform the customer as Valued [3, 4, 5]. Data mining analyses are used in all areas, it plays an important role in fields and it is necessary to provide support for decision makers, that is used in analyzing the selling amount on each and every area and how to calculate profit for every available specific product. As a consequence, institution and competence are required and domain experts to formulate portable and well defined investigation objectives, if the problem available in hand is not effectively identified and circumscribed one may run the risk of thwarting any future effort made during data mining activities. Data mining offers the link among the two. Data mining software analyzes relationships and patterns in stored transaction data based on open-ended user queries. A. Data gathering and integration. Once the work goal of the investigation has been analyzed, collecting information and data begins. Data coming from different sources and different places are integrated. Data sources may be internal, external or combinations of the two. The integration of distinct data sources may be suggested by the need to enrich the data with new descriptive dimensions, such as geomarketing variables, or with lists of names of potential customers, termed prospects, not yet existing in the company information. Different types of analytical software listed as follows 1) Machine learning 2) Neural networks 3) Statistical Different four types of relationships are listed below In some areas or places data sources are already well defined in data warehouses and data marts for OLAP analyses and more generally for decision support activities. In this places using some specific procedures that is useful to access and analyses data’s are easily possible. There are the possible situations where it is sufficient to select the attributes deemed relevant for data mining analysis. Some time that is a risk however, that, in order to limitations on the using memory, information stored in a data warehouse has been aggregated and consolidated to such an extent to render useless and subsequent analysis.[13,15] 1) Classes 2) Clusters 3) Associations. . Manuscript received August 08, 2013 R. Anusha, Research Scholor, Centre for Information Technology and Engineering, Manonmaniam Sundaranar University, Tirunelveli – 627012 N.Krishnan, Professor, Centre for Information Technology and Engineering, Manonmaniam Sundaranar University, Tirunelveli – 627012 1 Published By: Information Technology Foundation for Research Identification of a Potential Customer of Business Interest Using Data Mining and Data Warehousing Techniques D.Exploratory analysis In the third phase of data mining process, a first part of analysis of the data is carried out getting processing time acquainted with available information and carrying out data cleansing. The data stored in a data warehouse is processed for loading. For example student data birth ,age name , native place, living place, father name, brother name , since all data’s are detected in correct format and correct digits , right information are stored clearly and analyzed. First section deals with the distribution of the value of each and every attribute studied, using histogram for categorical attributes is studies Prediction and Interpretation Model development Attribute Selection Exploratory Analysis Preprocessing Data Mart Data gathering and interpretation E. Attribute Selection An attribute is a property or characteristic of an object, Examples: eye color of a person, temperature, etc. Attribute is also known as variable, field, characteristic, or feature. A collection of attributes describe an object or a record. Object is also known as record, point, case, sample A collection of attributes describe an object or a record., There are different types of attributesNominal , Ordinal, Interval, Ratio. Attributes are important for selection method, example disease has some attributes that is some example 1= strep throat ,2= cold,3= headache, 4=fever, similar attributes have similar values. Objective definition Fig.1 Data passing and feedback cycle Raw Data Data marts Fig. 2 Data Mart TABLE I Design Parameters of Double Gate and Single Gate F. Model development and validation B. Data Mart If the high quality data set has been assembled and possible enriched with newly defined attributes, model and rules are formed and extracted from the original dataset. Then, the analytical accuracy of each and every model generated can be accessed using the reset of data [6,7]. More precisely, the available dataset is split into two subsets. The first constitutes the training set and is used to identify a specific learning model within the selected class of models. Usually the sample size of the training set is chosen to be relatively small, although significant from a statistical standpoint – say, a few thousands observations. The second subset is the test set and is used to assess the accuracy of the alternative models generated during the training phase, in order to identify the best model for actual future predictions [30-33]. The work to establish a data warehouse is arduous, it is an original data collection to solution the data integrity issues and diversity. The warehouse will be a vast system that uses database management system, sometimes it takes many years and a great deal of money to finish. From simple application to think, we can extract a single database from one or more databases it will run and support transaction processing. As shown in figure(Fig.2), the new database is called a data mart [2] C. Data pre-processing Data pre-processing is a tedious task of data mining. It is mainly used for making analysis appropriate and also making data appropriate for clustering by deleting duplicate records and supplying missing data according to past recorded data. The main benefits of data pre-processing is to reduce memory. Clustering is a process of separating dataset into subgroups according to the unique feature. Clustering separated the dataset into relevant and non-relevant dataset So data pre-processing is an essential but the vital task of data mining. The main goal of data pre-processing is to make an appropriate analysis and suitable for clustering of collected data. Data pre-processing avoids the double data and adds the missing values according to the past recorded data. It also reduces the memory and normalizes the values that are stored in database[1]. G.Prediction and interpretation. To finish the conclusion of the data mining process, the model selected among those generated during the development phase should be implemented and used to achieve the desire goals that were originally identified. Moreover, it should be incorporated into the procedures that supports decision-making process so that knowledge workers may be able to use it to draw predictions and acquire a more in-depth knowledge of the phenomenon of interest [8-12]. H.Major Trends In Technologies And Methods 2 Published By: Information Technology Foundation for Research International Journal of Service Computing and Computational Intelligence (IJSCCI) ISSN: 2162 – 514X, Volume-1, Jan-2014 There are a number of data mining trends is in terms of technologies and methodologies which are currently being developed and researched. These trends include methods for analyzing more complex forms of data, as well as specific techniques and methods. The trends identified include distributed data mining, hypertext /hypermedia mining, ubiquitous data mining, as well as multimedia, spatial, and time series/sequential data mining. These are examined in detail in the upcoming sections. Misuse identification false claiming and misuse of account in bank. Risk evolution The purpose of risk analysis is to estimate the risk connected the networks. How the decision is analyzed and calculated? What are the different factors that affect decision taking time? How factors are communicated in different sources? II. PHASES IN THE DEVELOPMENT OF MATHEMATICAL MODELS OF DECISION MAKING Data mining applications A. Define the problem Medical / Pharma Insurance and Health Care Banking/ finance There is a need to clearly understand the work flow of the institutions and analyze each and every work and decisions makers are clearly understood. In this case, an ineffective production plan may be the cause of the stock accumulation [15-19]. Retail / Marketing Fig. 3 Applications of Data mining B. Identified problem The BIG BANK is a currently flourished bank in the financial sector. BIG BANK has 10 million account holders. They are going to introduce the Credit Card in the market. In the current financial market, there is a lot of competition for the Credit Card Sectors. If we find the Valued Customers, It’ll be successful and back bone for the bank. Since the bank is a flourished bank, they don’t want to take the risk in the financial market. So they have decided to sell the Credit card to Valued and trust worthy Internal Customer only. Plenty of account holders are in the Banks Data warehouse. We need to identify valued customer from the Data warehouse. Three Stages to find the Valued Customer: • Right customers (acquisition) • Right relationship (development) • Right retention (keeping valuable customers) I. Applications in enterprises 1. Analytics – A program that builds quantitative process for a business to arrive at optimal decisions and to perform business knowledge discovery. It frequently involves data mining, process mining, statistical analysis, predictive analytics, predictive modeling, business process modeling, complex event processing, and prescriptive analytics. 2. Reporting/enterprise reporting –A program that builds infrastructure for strategic reporting to serve the strategic management of a business, not operational reporting. It frequently involves data visualization, executive information system and OLAP. 3. Collaboration/collaboration platform – A program that gets different areas (both inside and outside the business) to C. Right Customers work together through data sharing and electronic data We are analyzing the customers who will be the most interchange. valuable persons or areas to our business, who are these 4. Knowledge management – A program to make the valuable persons, most often those who will again and again company to take data through strategies and practices to repeat business with our financial area for a long time. In the identify, create, represent, distribute, and enable adoption of reliability effect, how long a customer must stay in order to insights and experiences that are part of true business pay for the cost of acquisition? Companies can no longer knowledge. Knowledge management leads to learning afford to indiscriminately recruit valuable customers without examining their long-term value. The analytical capabilities management and regulatory compliance are to identify customers who will be loyal and profitable. J. Relational marketing Classifications, segmentations and analysis of the customer base reveals hidden characteristics and trends that affect the These areas have significantly contributed to increase the value. Some persons are using low-value because they make popularity of these methodologies [20, 21].Such only small business. Other persons are having high lifetime segmentation provides a powerful tool for marketers of all values due to long time in bank activities, because they have kinds. It can help companies to identify and better regularly made these small purchases every week for the past understand key customer segments, target them more ten years. These kinds of activities are used to easily analyze efficiently.Some similar work in relational marketing, the the best customer, and enable us to go after the new works are listed below customers of the company so that they can most profitably 1. Identification of customer 2, identification of target serve. 3.predication of rate 4. Interpretation and understanding. 5. Analysis of the products. 3 Published By: Information Technology Foundation for Research Identification of a Potential Customer of Business Interest Using Data Mining and Data Warehousing Techniques D. Right Relationship a problem consists of the formulation of a tactical production plan over the medium term, decision variables should express production volumes for each product, for each process and for each period of the planning horizon. Numerical parameters: It is also necessary to accurately identify and estimate all numerical parameters required by the model. In the production planning example, the available capacity should be known in advance for each process, as well as the capacity absorption coefficients for each combination of products and processes. Mathematical relationships: The final step in the formulation of a model is the identification of mathematical relationships among the decision variables, the numerical parameters and the performance indicators defined during the previous phases. Sometimes these relationships may be exclusively deterministic, while in other instances it is necessary to introduce probabilistic relationships. In this phase, the trade-off between the accuracy of the representation achieved through the model and its solution complexity should be carefully considered. It may turn out more helpful at a practical level to adopt a model that sacrifices some marginal aspects of reality in the representation of the system but allows an efficient solution and greater flexibility in view of possible future developments [25-29]. Even with the well chosen customers, managers must develop the relationship. Customers who do don’t receive the right touch or get too many conflicting offers lose rather than gain value. For any business, the right relationship is one that maximizes that customer’s lifetime value. A simplified view of customer lifetime value is: LTV = how much data purchasing size x frequenc*duration So the main goal of customer relationship management is to increase the size and frequency of purchases and extend how long the customer continues to buy. Since marketers can’t know the duration of a relationship until it is over, they use loyalty measures to estimate how long customers will stay. E. Right Retention Effective retention means retaining the right customers, not every customer. Managers need to focus their retention actions on customers with the highest lifetime value. Spending precious resources to retain marginally profitable or unprofitable customers actually hurt the overall value of the customer base, especially if these retention efforts succeed. Right retention is therefore rooted in knowing which individuals are most valuable [22-24]. F. Model formulation: III. ATTRIBUTES (DECISION VARIABLES) FOR THE IDENTIFIED PROBLEM Once the problem to be analyzed has been properly identified, effort should be directed toward defining an appropriate mathematical model to represent the system. A number of factors affect and influence the choice of model, such as the Time horizon Decision variables Evaluation criteria Numerical parameters Mathematical relationships. Time horizon: Usually a model includes a temporal dimension. For example, to formulate a tactical production plan over the medium term it is necessary to specify the production rate for each week in a year, whereas to derive an operational schedule it is required to assign the tasks to each production line for each day of the week. As we can see, the time span considered in a model, as well as the length of the base intervals, may vary depending on the specific problem considered. Evaluation criteria: Appropriate measurable performance indicators should be defined in order to establish a criterion for the evaluation and comparison of the alternative decisions. These indicators may assume various forms in each different application, and may include the following factors: • Monetary costs and payoffs; • Effectiveness and level of service; • Quality of products and services; • Flexibility of the operating conditions; • Reliability in achieving the objectives. Decision variables: Symbolic variables are representing alternative decisions should then be defined. For example, if 1. Age 2. Income 3. Family dependents 4. Location 5. Occupation 6. Years of Experience in Current Job 7. Vintage in Liability (Years in Account holders) 8. Disease details 9. Other Asset information 10. Rental / Owned house 11. Vehicle details 12. Average Quarterly balance should be maintained 13. CIBIL checking to know the delinquency of the customer 14. Cheque returns to be maintained for the Account delinquency 15. Checking the last 6 months Salary credited account 1 - Age and Income Common preferable age is 18 to 40 and income is not minimum 2000, if age, incomes very low then we are desired the customer is not a valued customer, if age is above 18 then income above this kind of customers are accepted customers. If the age is high and income is high then we are desired the customer is valuable and preferred customer. 2 - Age and Occupation If the customer age is high, The Occupation must be in the Higher grade. Otherwise the customer is not a preferred customer. 3 -Income and Disease information 4 Published By: Information Technology Foundation for Research International Journal of Service Computing and Computational Intelligence (IJSCCI) ISSN: 2162 – 514X, Volume-1, Jan-2014 If the Income of the customer is high and the customer have the Contagious Disease or heredity disease or heart disease; then the customer is not a preferred customer. If the preferable income (15000 and above) and the ration of the disease is moderate. This kind of customer is preferred customer. V. IMPLEMENTATION AND TEST The model has been fully identified and developed and finally implemented, tested and utilized in the application domain. It is also necessary that the correctness of the data and the numerical parameters entered in the model be preliminarily assessed. The data are normally coming from data warehouse or a data mart. The results have been obtained using the developing solution procedure • The plausibility and likelihood of the conclusions are achieved; • The consistency of the results at extreme values of the numerical parameters; • The stability of the results when minor changes in the input parameters are introduced. 4– Age and Rental / Owned house If the customer age is preferred age (18 to 40) and the customer is in Rental house; here not age is not important, before the take the decision we analyzing the money factor how long the amount available in the account. 5- CIBIL check and All other Attributes. The ‘Accounts’ section of your credit report contains existing and past credit facilities that you have availed from various loan providers. For example, if you have a home loan and a personal loan, your credit report will reflect both accounts on your credit report along with details such as the name of the lender, type of credit facility, dates of opening and closing (if applicable) of each account, current balances, status of the accounts and your payment history. Your credit report summarizes your credit behavior across these accounts for the last 2 years. If the customer age is low and his/her income is very high, Owned house and no disease details, Now the behavior of the customer is Valued. But CIBIL checking is the Sensitive and very important in the Banking sectors.. VI. CONCLUSIONS The newly developing managing products or marketing methods are used to effectiveness of each and individual customer over the entire life of the relationship. In paper introduced Data mart, and preprocessing techniques to improve the desire result with past Here data mart is introduced and preprocessing techniques are employed. The product aim is to achieve maximum lifetime profit from the entire customer base. Customer value management enables companies to take full advantage of economics of reliability by increasing retention. Reducing risk and amortizing acquisitions cost over a long and more profitable period of engagement. Not every individual customer gain, but each must be manages to maximize overall profit., even when the management consists of identifying which customer have small amount to the business, and focusing development and retention efforts elsewhere. 6 – Cheque returns to be maintained for the Account delinquency Some time , if someone forges a signature on a cheque, the person whose signature was forged is not then bound to honour the cheque, and their bank does not have to pay it. A cheque with a forged signature is simply a worthless piece of paper – a "nullity". REFERENCES [1] Tasnuba Jesmin, Kawsar Ahme,Md. Zamilur,Md. Badrul Alam "Brain Cancer Risk Prediction Tool Using Data Mining" International Journal of Computer Applications (0975 – 8887)Volume 61– No.12, January 2013 [2] Zhang Danping,Deng Jin,The Data Mining of the Human Resources Data Warehouse in University Based on Association Rule Zhang Danping,"The Data Mining of the Human Resources Data Warehouse in University Based on Association Rule” Journal of Computers, vol. 6, no. 1, january 2011 [3] Agrawal R., Srikant R. (1995). Mining sequential patterns. In: P. Yu et A. Chen (eds.), ICDE ’95: Proceedings of the Eleventh International Conference on Data Engineering, IEEE Computer Society. [4] Agrawal R., Imielinski T., Swami A. (1993a). Database mining: A performance perspective. IEEE Transactions on Knowledge and Data Engineering, [5] Agrawal R., Imielinski T., Swami A. (1993b). Mining association rules between sets of items in large databases. In: SIGMOD ’93: Proceedings of the 1993 ACM SIGMOD international conference on Management of data, ACM Press. [6] Agrawal R., Mannila H., Srikant R., Toivonen H., Verkamo A. (1996). Fast discovery of association rules. 7- Years of Experience in Current Job and Average Quarterly Balance maintained. The customer behavior and years of experience have to be taken into account and average quarterly balance is maintained clearly. We can access the customer that how recallable and worthable persons are clearly maintained. IV. DEVELOPMENT OF ALGORITHMS The mathematical model has been developed and operations are clearly defined. one will naturally wish to proceed with its solutions to evaluate the decision and to select the best alternatives. If the mathematical model has been defined, one will naturally process for providing the solution to assess the decision and to select the best alternative method. In another way, once we have developed model identified and analyses software tool that incorporates the solution method should be developed or created. A developer or creator pass the values one by one the values strictly and prohibited. 5 Published By: Information Technology Foundation for Research Identification of a Potential Customer of Business Interest Using Data Mining and Data Warehousing Techniques In: Advances in knowledge discovery and data mining, American Association for Artificial Intelligence. [7] Bakan J. (2005). The Corporation: The Pathological Pursuit of Profit and Power. Free Press. [8] Baldi P., Brunak S. (2001). Bioinformatics: the machine learning approach. MIT Press. [9] Battistini V., Contini A., Del Prato G., Palopoli G., Valentini D., Vercellis C. (1999). L’ottimizzazione della catena logistica integrata: il caso Barilla alimentare. Logistica e Management, [10] Berry M., Linoff G. (1999). Mastering Data Mining. Wiley. [11] Berry M., Linoff G. (2002). Mining the Web: Transforming Customer Data into Customer Value. Wiley. [12] Berry M., Linoff G. (2004). Data Mining Techniques: For Marketing, Sales, and Customer Relationship Management. Wiley. [13] Berson A., Smith S. (1997). Data Warehousing, Data Mining, and OLAP. Mcgraw-Hill. [14] Berson A., Smith S., Thearling K. (1999). Building Data Mining Applications for CRM. McGraw-Hill. [15] Bertsekas D. (2003). Convex Analysis and Optimization. Athena Scientific. Bishop C. (1995). Neural Networks for Pattern Recognition. Oxford University Press. [16] Bolloju N., Khalifa M., Turban E. (2002). Integrating knowledge management into enterprise environments for the next generation decision support. Decision Support Systems, 33, 163–176. [17] Box G., Jenkins G., Reinsel G. (1994). Time Series Analysis: Forecasting & Control . Prentice Hall. [18] Bradley P., Fayyad U., Mangasarian O. (1999). Mathematical programming for data mining: formulations and challenges. INFORMS Journal on Computing, 11, 217–238. [19] Breiman L., Friedman J., Olshen R., Stone C. (1984). Classification and Regression Trees. Chapman & Hall. [20] Breslow L., Aha D. (1997). Simplifying decision trees: A survey. Knowledge Engineering Review, 12, 1–40. [21] Brockwell P., Davis R. (2002). Introduction to Time Series and Forecasting. Springer [22] Bruhn M. (2002). Relationship Marketing: Management of Customer Relationships. Pearson. [23] Burges C. (1998). A tutorial on support vector machines for pattern recognition. Data Mining and Knowledge Discovery, 2, 121–167. [24] Cadez I., Heckerman D., Smyth P., Meek C., White S. (2003). Model-based clustering and visualization of navigation patterns on a web site. Data Mining and Knowledge Discovery, 7, 399–424. [25] Charnes A., Cooper W., Rhodes E. (1978). Measuring the efficiency of decision making units. European Journal of Operational Research, 2, 429–444. [26] Chatfield C. (2003). The Analysis of Time Series: An Introduction. Chapman & Hall. [27] Cherkassky V., Mulier F. (1998). Learning from data, concepts, theory and methods. Wiley. [28] Chopra S., Meindl P. (2003). Supply Chain Management. Prentice Hall. [29] Clemen R. (1997). Making Hard Decisions: An Introduction to Decision Analysis. Duxbury Press. [30] Jones R. (1980). Maximum likelihood fitting of arma models to time series with missing observations. Technometrics, 20, 389–395. [31] Kaufman L., Rousseeuw P. (1990). Finding Groups in Data: An Introduction to Cluster Analysis. Wiley. [32] Keen P., Scott Morton M. (1978). Decision support systems: an organizational perspective. Addison-Wesley. [33] Keys P. (1995). Understanding the process of operational research. Wiley. Kimball R. (1996). The Data Warehouse Toolkit . Wiley. [34] Venkata Sheshanna Kongara, D. Punyasesudu . Data Warehousing And Data Mining Applications For Atmospheric Studies , (IACEECE-2013). [35] Tipawan Silwattananusarn1 and Assoc.Prof. Dr. KulthidaTuamsuk " Data Mining and Its Applications for Knowledge Management:A Literature Review from 2007 to2012International Journal of Data Mining & Knowledge Management Process (IJDKP) Vol.2, No.5, September 2012 6 Published By: Information Technology Foundation for Research