Download Identification of a Potential Customer of Business Interest Using

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Nonlinear dimensionality reduction wikipedia , lookup

Transcript
International Journal of Service Computing and Computational Intelligence (IJSCCI)
ISSN: 2162 – 514X, Volume-1, Jan-2014
Identification of a Potential Customer of
Business Interest Using Data Mining and
Data Warehousing Techniques
R.Anusha1 and N.Krishnan2

Abstract— Data mining and Data processing plays a vital role in day to day activities that is a boon to the business people,
due to their importance. Nowadays, data mining and data processing are omnipresent in all the fields. The fields such as
banking, medicine, education, finance organization, and commodity marketing are using data mining and data processing
concepts. A new methodology is proposed for commodity marketers to identify potential customers of business interest,
who like which products and which product is highly saleable. These are automatically stored and easily identified. The
data mining and preprocessing techniques are used to increase accuracy.
Index Terms— Business Intelligence, Data Mining, Valued customer management.
I. INTRODUCTION
The definition of the goals will benefit from close
cooperation between experts in the field of application and
data mining analysts, if it is possible to define the problem
and the goals of the investigation as the analysis of past data
and identification of a model so as to express the propensity
of customers to leave the service (churn) based on their
characteristics, in order to understand the reasons for such
disloyalty and predict the probability of churn and to
transform the customer as Valued [3, 4, 5].
Data mining analyses are used in all areas, it plays
an important role in fields and it is necessary to provide
support for decision makers, that is used in analyzing the
selling amount on each and every area and how to calculate
profit for every available specific product. As a consequence,
institution and competence are required and domain experts
to formulate portable and well defined investigation
objectives, if the problem available in hand is not effectively
identified and circumscribed one may run the risk of
thwarting any future effort made during data mining
activities. Data mining offers the link among the two. Data
mining software analyzes relationships and patterns in stored
transaction data based on open-ended user queries.
A. Data gathering and integration.
Once the work goal of the investigation has been
analyzed, collecting information and data begins. Data
coming from different sources and different places are
integrated. Data sources may be internal, external or
combinations of the two. The integration of distinct data
sources may be suggested by the need to enrich the data with
new descriptive dimensions, such as geomarketing variables,
or with lists of names of potential customers, termed
prospects, not yet existing in the company information.
Different types of analytical software listed as follows
1) Machine learning
2) Neural networks
3) Statistical
Different four types of relationships are listed below
In some areas or places data sources are already well
defined in data warehouses and data marts for OLAP
analyses and more generally for decision support activities.
In this places using some specific procedures that is useful to
access and analyses data’s are easily possible. There are the
possible situations where it is sufficient to select the attributes
deemed relevant for data mining analysis. Some time that is a
risk however, that, in order to limitations on the using
memory, information stored in a data warehouse has been
aggregated and consolidated to such an extent to render
useless and subsequent analysis.[13,15]
1) Classes
2) Clusters
3) Associations.
.
Manuscript received August 08, 2013
R. Anusha, Research Scholor, Centre for Information Technology and
Engineering, Manonmaniam Sundaranar University, Tirunelveli – 627012
N.Krishnan, Professor, Centre for Information Technology and
Engineering, Manonmaniam Sundaranar University, Tirunelveli – 627012
1
Published By:
Information Technology Foundation for Research
Identification of a Potential Customer of Business Interest Using Data Mining and Data Warehousing Techniques
D.Exploratory analysis
In the third phase of data mining process, a first part of
analysis of the data is carried out getting processing time
acquainted with available information and carrying out data
cleansing. The data stored in a data warehouse is processed
for loading. For example student data birth ,age name ,
native place, living place, father name, brother name , since
all data’s are detected in correct format and correct digits ,
right information are stored clearly and analyzed. First
section deals with the distribution of the value of each and
every attribute studied, using histogram for categorical
attributes is studies
Prediction and Interpretation
Model development
Attribute Selection
Exploratory Analysis
Preprocessing
Data Mart
Data gathering and interpretation
E. Attribute Selection
An attribute is a property or characteristic of an object,
Examples: eye color of a person, temperature, etc. Attribute
is also known as variable, field, characteristic, or feature. A
collection of attributes describe an object or a record. Object
is also known as record, point, case, sample A collection of
attributes describe an object or a record., There are different
types of attributesNominal , Ordinal, Interval, Ratio.
Attributes are important for selection method, example
disease has some attributes that is some example 1= strep
throat ,2= cold,3= headache, 4=fever, similar attributes have
similar values.
Objective definition
Fig.1 Data passing and feedback cycle
Raw Data
Data marts
Fig. 2 Data Mart
TABLE I
Design Parameters of Double Gate and Single Gate
F. Model development and validation
B. Data Mart
If the high quality data set has been assembled and possible
enriched with newly defined attributes, model and rules are
formed and extracted from the original dataset. Then, the
analytical accuracy of each and every model generated can be
accessed using the reset of data [6,7].
More precisely, the available dataset is split into two subsets.
The first constitutes the training set and is used to identify a
specific learning model within the selected class of models.
Usually the sample size of the training set is chosen to be
relatively small, although significant from a statistical
standpoint – say, a few thousands observations. The second
subset is the test set and is used to assess the accuracy of the
alternative models generated during the training phase, in
order to identify the best model for actual future predictions
[30-33].
The work to establish a data warehouse is arduous, it is an
original data collection to solution the data integrity issues
and diversity. The warehouse will be a vast system that uses
database management system, sometimes it takes many years
and a great deal of money to finish. From simple application
to think, we can extract a single database from one or more
databases it will run and support transaction processing. As
shown in figure(Fig.2), the new database is called a data mart
[2]
C. Data pre-processing
Data pre-processing is a tedious task of data mining. It is
mainly used for making analysis appropriate and also
making data appropriate for clustering by deleting duplicate
records and supplying missing data according to past
recorded data. The main benefits of data pre-processing is to
reduce memory. Clustering is a process of separating dataset
into subgroups according to the unique feature. Clustering
separated the dataset into relevant and non-relevant dataset
So data pre-processing is an essential but the vital task of data
mining. The main goal of data pre-processing is to make an
appropriate analysis and suitable for clustering of collected
data. Data pre-processing avoids the double data and adds the
missing values according to the past recorded data. It also
reduces the memory and normalizes the values that are stored
in database[1].
G.Prediction and interpretation.
To finish the conclusion of the data mining process, the
model selected among those generated during the
development phase should be implemented and used to
achieve the desire goals that were originally identified.
Moreover, it should be incorporated into the procedures that
supports decision-making process so that knowledge workers
may be able to use it to draw predictions and acquire a more
in-depth knowledge of the phenomenon of interest [8-12].
H.Major Trends In Technologies And Methods
2
Published By:
Information Technology Foundation for Research
International Journal of Service Computing and Computational Intelligence (IJSCCI)
ISSN: 2162 – 514X, Volume-1, Jan-2014
There are a number of data mining trends is in terms of
technologies and methodologies which are currently being
developed and researched. These trends include methods for
analyzing more complex forms of data, as well as specific
techniques and methods. The trends identified include
distributed data mining, hypertext /hypermedia mining,
ubiquitous data mining, as well as multimedia, spatial, and
time series/sequential data mining. These are examined in
detail in the upcoming sections.
Misuse identification false claiming and misuse of account in
bank.
Risk evolution
The purpose of risk analysis is to estimate the risk
connected the networks. How the decision is analyzed and
calculated? What are the different factors that affect decision
taking time? How factors are communicated in different
sources?
II. PHASES IN THE DEVELOPMENT OF
MATHEMATICAL MODELS OF DECISION MAKING
Data mining applications
A. Define the problem
Medical /
Pharma
Insurance
and Health
Care
Banking/
finance
There is a need to clearly understand the work flow
of the institutions and analyze each and every work and
decisions makers are clearly understood. In this case, an
ineffective production plan may be the cause of the stock
accumulation [15-19].
Retail /
Marketing
Fig. 3 Applications of Data mining
B. Identified problem
The BIG BANK is a currently flourished bank in the
financial sector. BIG BANK has 10 million account holders.
They are going to introduce the Credit Card in the market. In
the current financial market, there is a lot of competition for
the Credit Card Sectors. If we find the Valued Customers,
It’ll be successful and back bone for the bank. Since the bank
is a flourished bank, they don’t want to take the risk in the
financial market. So they have decided to sell the Credit card
to Valued and trust worthy Internal Customer only. Plenty of
account holders are in the Banks Data warehouse. We need to
identify valued customer from the Data warehouse.
Three Stages to find the Valued Customer:
• Right customers (acquisition)
• Right relationship (development)
• Right retention (keeping valuable customers)
I. Applications in enterprises
1. Analytics – A program that builds quantitative process for
a business to arrive at optimal decisions and to perform
business knowledge discovery. It frequently involves data
mining, process mining, statistical analysis, predictive
analytics, predictive modeling, business process modeling,
complex event processing, and prescriptive analytics.
2. Reporting/enterprise reporting –A program that builds
infrastructure for strategic reporting to serve the strategic
management of a business, not operational reporting. It
frequently involves data visualization, executive information
system and OLAP.
3. Collaboration/collaboration platform – A program that
gets different areas (both inside and outside the business) to C. Right Customers
work together through data sharing and electronic data We are analyzing the customers who will be the most
interchange.
valuable persons or areas to our business, who are these
4. Knowledge management – A program to make the valuable persons, most often those who will again and again
company to take data through strategies and practices to repeat business with our financial area for a long time. In the
identify, create, represent, distribute, and enable adoption of reliability effect, how long a customer must stay in order to
insights and experiences that are part of true business pay for the cost of acquisition? Companies can no longer
knowledge. Knowledge management leads to learning afford to indiscriminately recruit valuable customers without
examining their long-term value. The analytical capabilities
management and regulatory compliance
are to identify customers who will be loyal and profitable.
J. Relational marketing
Classifications, segmentations and analysis of the customer
base reveals hidden characteristics and trends that affect the
These areas have significantly contributed to increase the
value. Some persons are using low-value because they make
popularity of these methodologies [20, 21].Such
only small business. Other persons are having high lifetime
segmentation provides a powerful tool for marketers of all
values due to long time in bank activities, because they have
kinds. It can help companies to identify and better
regularly made these small purchases every week for the past
understand key customer segments, target them more
ten years. These kinds of activities are used to easily analyze
efficiently.Some similar work in relational marketing, the
the best customer, and enable us to go after the new
works are listed below
customers of the company so that they can most profitably
1. Identification of customer 2, identification of target serve.
3.predication of rate 4. Interpretation and
understanding. 5. Analysis of the products.
3
Published By:
Information Technology Foundation for Research
Identification of a Potential Customer of Business Interest Using Data Mining and Data Warehousing Techniques
D. Right Relationship
a problem consists of the formulation of a tactical production
plan over the medium term, decision variables should
express production volumes for each product, for each
process and for each period of the planning horizon.
Numerical parameters: It is also necessary to accurately
identify and estimate all numerical parameters required by
the model. In the production planning example, the available
capacity should be known in advance for each process, as
well as the capacity absorption coefficients for each
combination of products and processes.
Mathematical relationships: The final step in the formulation
of a model is the identification of mathematical relationships
among the decision variables, the numerical parameters and
the performance indicators defined during the previous
phases. Sometimes these relationships may be exclusively
deterministic, while in other instances it is necessary to
introduce probabilistic relationships. In this phase, the
trade-off between the accuracy of the representation achieved
through the model and its solution complexity should be
carefully considered. It may turn out more helpful at a
practical level to adopt a model that sacrifices some marginal
aspects of reality in the representation of the system but
allows an efficient solution and greater flexibility in view of
possible future developments [25-29].
Even with the well chosen customers, managers must
develop the relationship. Customers who do don’t receive the
right touch or get too many conflicting offers lose rather than
gain value. For any business, the right relationship is one that
maximizes that customer’s lifetime value. A simplified view
of customer lifetime value is:
LTV = how much data purchasing size x frequenc*duration
So the main goal of customer relationship management is to
increase the size and frequency of purchases and extend how
long the customer continues to buy. Since marketers can’t
know the duration of a relationship until it is over, they use
loyalty measures to estimate how long customers will stay.
E. Right Retention
Effective retention means retaining the right customers,
not every customer. Managers need to focus their retention
actions on customers with the highest lifetime value.
Spending precious resources to retain marginally profitable
or unprofitable customers actually hurt the overall value of
the customer base, especially if these retention efforts
succeed. Right retention is therefore rooted in knowing
which individuals are most valuable [22-24].
F. Model formulation:
III. ATTRIBUTES (DECISION VARIABLES) FOR THE
IDENTIFIED PROBLEM
Once the problem to be analyzed has been properly
identified, effort should be directed toward defining an
appropriate mathematical model to represent the system. A
number of factors affect and influence the choice of model,
such as the
 Time horizon
 Decision variables
 Evaluation criteria
 Numerical parameters
 Mathematical relationships.
Time horizon: Usually a model includes a temporal
dimension. For example, to formulate a tactical production
plan over the medium term it is necessary to specify the
production rate for each week in a year, whereas to derive an
operational schedule it is required to assign the tasks to each
production line for each day of the week. As we can see, the
time span considered in a model, as well as the length of the
base intervals, may vary depending on the specific problem
considered.
Evaluation criteria: Appropriate measurable performance
indicators should be defined in order to establish a criterion
for the evaluation and comparison of the alternative
decisions. These indicators may assume various forms in
each different application, and may include the following
factors:
• Monetary costs and payoffs;
• Effectiveness and level of service;
• Quality of products and services;
• Flexibility of the operating conditions;
• Reliability in achieving the objectives.
Decision variables: Symbolic variables are representing
alternative decisions should then be defined. For example, if
1. Age
2. Income
3. Family dependents
4. Location
5. Occupation
6. Years of Experience in Current Job
7. Vintage in Liability (Years in Account holders)
8. Disease details
9. Other Asset information
10. Rental / Owned house
11. Vehicle details
12. Average Quarterly balance should be maintained
13. CIBIL checking to know the delinquency of the
customer
14. Cheque returns to be maintained for the Account
delinquency
15. Checking the last 6 months Salary credited account
1 - Age and Income
Common preferable age is 18 to 40 and income is not
minimum 2000, if age, incomes very low then we are desired
the customer is not a valued customer, if age is above 18 then
income above this kind of customers are accepted customers.
If the age is high and income is high then we are desired the
customer is valuable and preferred customer.
2 - Age and Occupation
If the customer age is high, The Occupation must be in the
Higher grade. Otherwise the customer is not a preferred
customer.
3 -Income and Disease information
4
Published By:
Information Technology Foundation for Research
International Journal of Service Computing and Computational Intelligence (IJSCCI)
ISSN: 2162 – 514X, Volume-1, Jan-2014
If the Income of the customer is high and the customer
have the Contagious Disease or heredity disease or heart
disease; then the customer is not a preferred customer. If the
preferable income (15000 and above) and the ration of the
disease is moderate. This kind of customer is preferred
customer.
V. IMPLEMENTATION AND TEST
The model has been fully identified and developed and
finally implemented, tested and utilized in the application
domain. It is also necessary that the correctness of the data
and the numerical parameters entered in the model be
preliminarily assessed. The data are normally coming from
data warehouse or a data mart. The results have been
obtained using the developing solution procedure
• The plausibility and likelihood of the conclusions are
achieved;
• The consistency of the results at extreme values of the
numerical parameters;
• The stability of the results when minor changes in the
input parameters are introduced.
4– Age and Rental / Owned house
If the customer age is preferred age (18 to 40) and the
customer is in Rental house; here not age is not important,
before the take the decision we analyzing the money factor
how long the amount available in the account.
5- CIBIL check and All other Attributes.
The ‘Accounts’ section of your credit report contains
existing and past credit facilities that you have availed from
various loan providers. For example, if you have a home loan
and a personal loan, your credit report will reflect both
accounts on your credit report along with details such as the
name of the lender, type of credit facility, dates of opening
and closing (if applicable) of each account, current balances,
status of the accounts and your payment history. Your credit
report summarizes your credit behavior across these accounts
for the last 2 years. If the customer age is low and his/her
income is very high, Owned house and no disease details,
Now the behavior of the customer is Valued. But CIBIL
checking is the Sensitive and very important in the Banking
sectors..
VI. CONCLUSIONS
The newly developing managing products or marketing
methods are used to effectiveness of each and individual
customer over the entire life of the relationship. In paper
introduced Data mart, and preprocessing techniques to
improve the desire result with past Here data mart is
introduced and preprocessing techniques are employed. The
product aim is to achieve maximum lifetime profit from the
entire customer base. Customer value management enables
companies to take full advantage of economics of reliability
by increasing retention. Reducing risk and amortizing
acquisitions cost over a long and more profitable period of
engagement. Not every individual customer gain, but each
must be manages to maximize overall profit., even when the
management consists of identifying which customer have
small amount to the business, and focusing development and
retention efforts elsewhere.
6 – Cheque returns to be maintained for the Account
delinquency
Some time , if someone forges a signature on a cheque, the
person whose signature was forged is not then bound to
honour the cheque, and their bank does not have to pay it. A
cheque with a forged signature is simply a worthless piece of
paper – a "nullity".
REFERENCES
[1] Tasnuba Jesmin, Kawsar Ahme,Md. Zamilur,Md.
Badrul Alam "Brain Cancer Risk Prediction Tool Using
Data Mining" International Journal of Computer
Applications (0975 – 8887)Volume 61– No.12, January
2013
[2] Zhang Danping,Deng Jin,The Data Mining of the
Human Resources Data Warehouse in
University
Based on Association Rule Zhang Danping,"The Data
Mining of the Human Resources Data Warehouse in
University Based on Association Rule” Journal of
Computers, vol. 6, no. 1, january 2011
[3] Agrawal R., Srikant R. (1995). Mining sequential
patterns. In: P. Yu et A. Chen (eds.),
ICDE
’95:
Proceedings of the Eleventh International Conference
on Data Engineering, IEEE Computer Society.
[4] Agrawal R., Imielinski T., Swami A. (1993a). Database
mining: A performance perspective. IEEE Transactions
on Knowledge and Data Engineering,
[5] Agrawal R., Imielinski T., Swami A. (1993b). Mining
association rules between sets of items in large
databases. In: SIGMOD ’93: Proceedings of the 1993
ACM SIGMOD
international conference on
Management of data, ACM Press.
[6] Agrawal R., Mannila H., Srikant R., Toivonen H.,
Verkamo A. (1996). Fast discovery of association rules.
7- Years of Experience in Current Job and Average Quarterly
Balance maintained.
The customer behavior and years of experience have to
be taken into account and average quarterly balance is
maintained clearly. We can access the customer that how
recallable and worthable persons are clearly maintained.
IV. DEVELOPMENT OF ALGORITHMS
The mathematical model has been developed and operations
are clearly defined. one will naturally wish to proceed with its
solutions to evaluate the decision and to select the best
alternatives. If the mathematical model has been defined, one
will naturally process for providing the solution to assess the
decision and to select the best alternative method. In another
way, once we have developed model identified and analyses
software tool that incorporates the solution method should be
developed or created. A developer or creator pass the values
one by one the values strictly and prohibited.
5
Published By:
Information Technology Foundation for Research
Identification of a Potential Customer of Business Interest Using Data Mining and Data Warehousing Techniques
In: Advances in knowledge discovery and data mining,
American Association for Artificial Intelligence.
[7] Bakan J. (2005). The Corporation: The Pathological
Pursuit of Profit and Power. Free Press.
[8] Baldi P., Brunak S. (2001). Bioinformatics: the
machine learning approach. MIT Press.
[9] Battistini V., Contini A., Del Prato G., Palopoli G.,
Valentini D., Vercellis C. (1999). L’ottimizzazione
della catena logistica integrata: il caso Barilla
alimentare. Logistica e Management,
[10] Berry M., Linoff G. (1999). Mastering Data Mining.
Wiley.
[11] Berry M., Linoff G. (2002). Mining the Web:
Transforming Customer Data into Customer Value.
Wiley.
[12] Berry M., Linoff G. (2004). Data Mining Techniques:
For Marketing, Sales, and Customer Relationship
Management. Wiley.
[13] Berson A., Smith S. (1997). Data Warehousing, Data
Mining, and OLAP. Mcgraw-Hill.
[14] Berson A., Smith S., Thearling K. (1999). Building
Data Mining Applications for CRM. McGraw-Hill.
[15] Bertsekas D. (2003). Convex Analysis and
Optimization. Athena Scientific. Bishop C. (1995).
Neural Networks for Pattern Recognition. Oxford
University Press.
[16] Bolloju N., Khalifa M., Turban E. (2002). Integrating
knowledge management into enterprise environments
for the next generation decision support. Decision
Support Systems, 33, 163–176.
[17] Box G., Jenkins G., Reinsel G. (1994). Time Series
Analysis: Forecasting & Control . Prentice Hall.
[18] Bradley P., Fayyad U., Mangasarian O. (1999).
Mathematical programming for data mining:
formulations and challenges. INFORMS Journal on
Computing, 11, 217–238.
[19] Breiman L., Friedman J., Olshen R., Stone C. (1984).
Classification and Regression Trees. Chapman & Hall.
[20] Breslow L., Aha D. (1997). Simplifying decision trees:
A survey. Knowledge Engineering Review, 12, 1–40.
[21] Brockwell P., Davis R. (2002). Introduction to Time
Series and Forecasting. Springer
[22] Bruhn
M. (2002). Relationship Marketing:
Management of Customer Relationships. Pearson.
[23] Burges C. (1998). A tutorial on support vector machines
for pattern recognition. Data Mining and Knowledge
Discovery, 2, 121–167.
[24] Cadez I., Heckerman D., Smyth P., Meek C., White S.
(2003). Model-based clustering and visualization of
navigation patterns on a web site. Data Mining and
Knowledge Discovery, 7, 399–424.
[25] Charnes A., Cooper W., Rhodes E. (1978). Measuring
the efficiency of decision making units. European
Journal of Operational Research, 2, 429–444.
[26] Chatfield C. (2003). The Analysis of Time Series: An
Introduction. Chapman & Hall.
[27] Cherkassky V., Mulier F. (1998). Learning from data,
concepts, theory and methods. Wiley.
[28] Chopra S., Meindl P. (2003). Supply Chain
Management. Prentice Hall.
[29] Clemen R. (1997). Making Hard Decisions: An
Introduction to Decision Analysis. Duxbury Press.
[30] Jones R. (1980). Maximum likelihood fitting of arma
models to time series with missing observations.
Technometrics, 20, 389–395.
[31] Kaufman L., Rousseeuw P. (1990). Finding Groups in
Data: An Introduction to Cluster Analysis. Wiley.
[32] Keen P., Scott Morton M. (1978). Decision support
systems:
an
organizational
perspective.
Addison-Wesley.
[33] Keys P. (1995). Understanding the process of
operational research. Wiley. Kimball R. (1996). The
Data Warehouse Toolkit . Wiley.
[34] Venkata Sheshanna Kongara, D. Punyasesudu . Data
Warehousing And Data Mining Applications For
Atmospheric Studies , (IACEECE-2013).
[35] Tipawan Silwattananusarn1 and Assoc.Prof. Dr.
KulthidaTuamsuk " Data Mining and Its Applications
for Knowledge Management:A Literature Review from
2007 to2012International Journal of Data Mining &
Knowledge Management Process (IJDKP) Vol.2,
No.5, September 2012
6
Published By:
Information Technology Foundation for Research