Download Microsoft Word - 0932401824-BobbyS-Bab2finalx

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Nonlinear dimensionality reduction wikipedia , lookup

K-nearest neighbors algorithm wikipedia , lookup

K-means clustering wikipedia , lookup

Cluster analysis wikipedia , lookup

Transcript
-= CHAPTER II =-
LITERATURE REVIEW
The study in data mining in general revolves around how to extract
information and features contained in data and then analyzed to reveal new findings.
Although there are many studies of data mining in Customer Relation Management,
there still left much area to be explored and especially into specific area of sales field.
In this case, the study will be based on the sales field of cable broadcast subscription.
2. 1. Data Mining
Taken from Berry &LinOff (2004); Data mining, previously known as
Knowledge Discovery in Databases process (KDD), are process of extracting hidden
patterns or relationships between the data itself which usually taken from large
volumes of data, either statistically or trained data, for the purpose of discovering
general knowledge from groups of data.
Data Mining also gives the chance for users to be proactive to notice any
trends that might happen within new information, as stated from NASCIO Research
Brief (2004) which are common in government data mining activity. This serves as
purposes to:
5
6
•
Performance improvement.
•
Identify any anomalies.
•
Research for scientific information contained within.
•
Improve managements, including human resources.
•
Foresee any criminal activities that might happen.
•
Detecting strange behavior from potential terrorist activity.
These are the main task of data mining according to Berry &LinOff (2004):
•
Classification; which involves examination of features from objects usually in
forms of records from database and then allocating them into one of the
classes that has been defined that usually forms as new class identifiers.
•
Estimation; while classification outputs discrete values, estimation will deal
with some unknown values that are continuously generated, which can also
help classification tasks. Regression and neural networks are examples of
estimation tasks.
•
Prediction; while it has similar meaning with estimation, prediction deals with
future behavior as in while estimation deals with present values that were not
obvious without data mining process. And thus, this process requires patience
since it needs to constantly evaluate temporary relationship between the
sources and predicted target outputs. And thus, any techniques of
classification and estimation can be utilized and involves samples, predicted
outcomes, and the historical data that involves two of them.
7
•
Association; or affinity grouping, is to decide which things related to. In sales,
this could be explained with examples like the things customer will add in
their shopping cart. This will creates cross-selling opportunities that will help
designing new interesting packages that gives more appeal.
•
Clustering; is simply a segmentation of population into subgroups or clusters.
•
Profiling; an important task to give some sort of identification that explains or
describes of where to look based on association and clustering. Decision tree
is the most common techniques for this kind of data mining.
2. 1. 1. Clustering Methods
According to Han &Kamber (2001), clustering, also known as descriptive
modeling is a data mining techniques which divides data into groups, although the
knowledge of group identification cannot be known at first and through analyzing of
the data patterns will the group, or cluster, can be recognized by its behaviors. At the
least, it will define with cluster that are more crowded and those that are sparse.
There are many clustering techniques known throughout histories and
researches which are still evolve as of today, mostly by improving the algorithm
process of that particular technique. But there is also another way to implements
clustering, which includes combining of several techniques to produce more varieties
of results. This also necessary because one of the clustering techniques can filter out
data parameters that are not useful for discovering the cluster itself, even after doing
manual preprocessing of the data itself.
8
Han &Kamber(2001) also refers another cluster analysis as unsupervised
learning because it does not depend on predefined classes and known labels, it learns
by observation. And in conceptual clustering, a class which contains a group of object
can only be described by a concept, which differs from conventional clustering where
it will measure similarity according with the distances between two components as it
discovers the appropriate class and then form descriptions from each class, which is
similar with classification (Han &Kamber, 2001).
Clustering in data mining has typical requirements according to Han
&Kamber (2001):
1. Scalability from either small samples or large database to avoid biased results.
2. Can handle different kind of attributes, which can be mixtures of binary,
nominal, and ordinal data.
3. Discovering arbitrary shape in clusters since not all common cluster are
spherical.
4. Minimal domain knowledge to help determine parameters that has high
dimension
5. Dealing with outliers or noisy data to avoid clusters with poor quality.
6. Insensitivity with the order from input sources for dynamic process.
7. Reducing high dimensionality, since normal humans can study at least up to 3
dimensions.
8. Constraint-based clustering that produces group of data that has good cluster
behavior.
9
9. Semantic interpretation and usability which relates to goals that may influence
the selection of cluster method.
2. 1. 1. 1.Self-Organizing Maps
Self-Organizing Maps (SOM), also known as Kohonen Networks, are variant
of neural networks used for undirected data mining tasks such as cluster detection
which can recognize unknown patterns in the data (Berry &LinOff, 2004).
Berry &LinOff (2004) also states that the simplest analogy in explaining how
SOM works is like throwing balls booth at the carnival. Usually a player will try to
throw the ball into the holes on the wall. Imagine this time that the wall doesn’t have
holes yet. When a player throws a ball for the first time, it creates a dent on it. As the
player continues to throw the balls, it creates dents on various places. But as the dents
in the vicinity become larger and deeper, soon enough there’s one exact moment
when a ball thrown at it, it finally creates a hole. And the next time the player
succeeds throwing the ball into that hole, he wins the prize. That hole, in SOM, is
called an identifiable cluster.Other similar examples would be exercising at the
archery. Couples of beginning shots are usually weak and createrandom spots on the
target. But later on, those marks become abundant and the areas around bulls-eye are
filled with so many shot marks.
In SOM, the more parameters contributed, especially specific ones can change
the outcome of the identified cluster which could affect the strategies based on the
parameters identified from the proposed cluster that has important parameters. These
10
important parameters have significant distribution value that might affect the quality
of the cluster itself (Berry &LinOff, 2004).Figure 2.1 represents the whole idea of
how SOM works in clustering area of distributed data collections.
The output units compete with
each other for the output of the
network.
The output layer is laid out like a
grid. Each unit is connected to
all the input units, but not to each
other.
The input layer is connected to
the inputs.
- The winning output
unit and s path
0.1
0.2
0.2
0.2
0.1
0.6
0.7
0.6
0.8
Figure 2.l.Example ofHow SOM Works in Detecting Clusters by Defining (Upper)
and Locating the Intended Cluster (Below). (Berry &LinOff, 2004)
11
According to Vesanto (2002), a training program will be conducted
sequentially until it achieves satisfying results based on the learning rate, using these
formula:
bi = argminj{|| xi – mj (t) ||}.
The best-matching unit (BMU) represented by bi, contains the map unit with
prototype unit (mj) closest distant measure to sample data vector (xi) within each
training steps (t). The prototype vectors will be updated by moving them toward xi
using this rule:
mj (t + 1) = mj(t) + α(t) hbij(t) [xi – mj (t)],
wheret is the training step index, α(t) is learning rate and hbij(t) is a neighborhood
kernel centered on the winner unit. The visualization can be seen on Figure 2.2 where
BMU and its neighbors were updated toward input sample marked by x. The black
and gray dots represent position before and after updating while the solid and dashed
lines represent neighborhood relations.
12
Figure 2. 2. SOM training process (Vesanto, 2002)
The algorithm used in SOM has the ability to self-learning incrementally
based on the regression that performed recursively at each representation of the
sample data and the distance factors between each model of the data set itself. And
all of these can be performed in batch, resulting in faster algorithms (Kohonen, 2005).
Bação, Lobo, &Painho (2005) summarize the whole basic SOM training as
follows:
13
X = the set of n training patterns x1, x2,..xn.
W = p x q grid of units wijwhere i and j are their coordinates on that grid.
α = the learning rate, assuming values in [0,1], initialized to a given initial learning
rate.
r = the radius of the neighborhood function h(wij,wmn,r),initialized to a given initial
radius.
1 Repeat
2
For k=1 to n
3
For all wij W, calculate dij= || xk- wij||
4
Select the unit that minimizes dijas the winner wwinner
5
Update each unit wij W: wij= wij+ αh(wwinner,wij,r) || xk -wij||
6
Decrease the value of αand r
7 Until α reaches 0
Feature extraction can also be done with SOM, as Liu, Weisberg, and Mooers
(2006) has done by using artificial data representative of known patterns related to
their previous research for the ocean currents on West Florida. It extracts the patterns
of the linear progressive sine wave. Further research combines the effects of SOM
tunable parameters by adding random noise into the progressive wave data. By this
improvisation, SOM techniques can successfully extracts separate patterns that
associated with transitional patterns with more of the typical weather related.
Feature extraction can also be done with SOM, as Liu, Weisberg, and Mooers
(2006) has done by using artificial data representative of known patterns related to
their previous research for the ocean currents on West Florida. It extracts the patterns
of the linear progressive sine wave. Further research combines the effects of SOM
tunable parameters by adding random noise into the progressive wave data. By this
14
improvisation, SOM techniques can successfully extracts separate patterns that
associated with transitional patterns with more of the typical weather related.
2. 1. 1. 2. K-Means
Taken from Maitra, Peterson, &Ghosh (2010), “The K-means clustering
algorithm iteratively partitions a dataset into K groups in the vicinity of its
initialization such that an objective function defined in terms of the total within-group
sum-of-squares is minimized” (p. 1). Thus, this technique helps locating the center of
clusters, though the success of this method relies from the starting kernel values of
each cluster. To understand the concept itself, the following diagram will show the
difference:
Figure 2.3. Clustering results using 10 clusters (shown by projection into 2d space)
using the sample training image. K-means clustering without applying kernel
transformation (left) is compared with kernel k-means clustering (right).
(Honarkhah&Caers, 2010)
15
Results from Figure 2.3 were taken from the samples of training images
which is converted into grid templates and then measured using Euclidean distance
which are well known for its simplicity, which is known as Dissimilarity distance
function(Honarkhah&Caers, as cited from Suzuki &Caers, Arpat&Caers, 2006, 2007):
‫ܜ܉ܘ‬
࢓
࢓
ሺ‫ܝ‬ሻ
and
‫ܜ܉ܘ‬
ሺ‫ܝ‬ሻ
‫܂‬
‫܂‬
are the two patterns from pattern database that will be
measured.
After that, the calculation will be placed into matrices, which is then
calculated using kernel function of K-Means itself:
This includes sets of k centers, located at central of data which are the closest. And
for this membership function, every data that belong to its nearest center will form a
partition of data while separating the distant data, and this should minimize the
variance from within clusters(Hamerly&Elkan, 2003).
16
Figure 2.4. K-Means distant and close nodes (Hamerly&Elkan, 2003)
However, the most common well-used here is the Gaussian radial basis
function (Honarkhah&Caers, 2010):
(10)
According to Maitra, Peterson, &Ghosh (2010), the performance of K-Means
depends the initialization process where on a well-separated groups would mostly
performs well under any kinds of cluster scenarios while it is not practical to perform
K-Means on a very large datasets. However, Maitra, Peterson, &Ghosh (2010) had
tested at least eleven common initialization methods which cover wide variety of
initialization strategies and it still does not provide better strategies and further
analysis was needed, which proves that it was still a long way ahead until the
discovery a much more strategically K-Means clustering method.
17
2. 1. 1. 3. Expectation–Maximization Mixture Model
According to Gupta & Chen (2011), Expectation-Maximization (EM) based
on Gaussian mixture models (GMM) which is intended for “learning an optimal
mixture of fixed models, for estimating the parameters of a compound Dirichlet
distribution, and for dis-entangling superimposed signals”. EM also helps estimating
both GMM and hidden Markov’s models (HMMs).
Gupta & Chen (2011) states that there are several variations of EM:
•
Generalized EM (GEM) finds
maximizes,F( , q) = Q( ,
that improves, but not necessarily
(t)) in the M-step. This is useful when the
exactM-step is difficult to carry out. Since this is still coordinate ascent,
GEMcan find a local optimum.
•
Stochastic EM: The E-step is computed with Monte Carlo sampling. This
introduces randomness into the optimization, but asymptotically it will
converge to a local optimum.
•
Variational EM: q(Z) is restricted to some easy-to-compute subset of
distributions, for example the fully factorized distributions q(Z) = Πi q(zi).
18
True G
!viM density
1000 i.i.d. samples
10
·· M-\..
10
8
.
6
5
2
0
,
-··
..
4
·.:.".,
2
. it
"; . .
......
!;;:"" .
.
0
-
-
..;t'l!:·•·:..
.
't
.. -· · ..
•.. t • ••..
-4
-5
.
...J.. •.". : -;-·-- ....... -
-6
-8
-5
0
5
10
-10
-10
Initial G uess
m = 0, £(0)= - 3.9756
0
5
10
1st EM estimate
m = 1, £{1l= - 3.6492
10
10
-5
10
-5
0
5
10
-5
2nd EM estimate
m = 2, £(2)= -3.6446
0
5
3rd EM estimate
m = 3, £(3)= - 3.6438
10
5
5
0
0
-5
-5
10
19
-10
-I 0
·5
0
5
I0
-10
-I 0
-5
0
5
Figure 2. 5. GMM fitting examples from EM estimates (Gupta & Chen, 2011)
I0
20
However, Gupta & Chen (2011) states that analysis using EM depends on
these facts:
1. Convergence. The monotonicity from EM algorithms iterations can be
guaranteed as well as the guesses based on their likelihood, but it cannot
guarantee the convergence of the sequence because it depends on the
characteristics and starting points.Proof of the monotonicity theorem are close
related with Markov relationship.EM algorithm for the maximum a posteriori
(MAP) also has monotonicity properties.
2. Maximization-Maximization; is
a
joint
maximization
procedure
that
iteratively maximizes a better lower bound to the log-likelihood function. The
interpretation establishes EM as belonging for the class of methods called
alternating optimization or alternating minimization methods which also
includes projection onto convex sets (POCS) and the Blahut-Arimoto
algorithms (Stark & Yang, 1998).
The algorithms itself are as follows (Zhu, 2007):
The defined processes are:
1. Estimate
(t=0)
from labeled examples only.
2. Repeat until convergence:
21
a. E-step: For i = 1…n, k = 1…K, compute γik = p(yi = k|xi,
b.
M-step: Compute
(t+1)
(t)
).
from (7). Let t = t + 1.
2. 2. Principal Component Analysis
Principal Component Analysis is one of the studies of Multivariate Statistic
which is “a data analysis technique that relies on a simple transformation of recorded
observation, stored in a vector z ‫א‬RN, to produce statistically independent score
variables, stored in t‫א‬Rn, n ≤ N:
t = P Tz .
P is a transformation matrix, constructed from orthonormal column vectors.” (Kruger,
Zhang, &Xie, 2008). PCA itself has been used as one of the basis clustering method,
especially in data mining, based on variables or physical features it has.
Physical feature itself has many definitions that depend on the situation
related with the field of research itself, but sufficient to say, physical feature are the
physical attributes extracted from set of data (www.faqs.org, 2010). In documents, it
is necessary to extract physical feature since it has abstract view at first and does not
reveal specific attributes of the current document.
Another analogy taken from Berry &LinOff (2004) to describe physical
feature is like a form of liquids, and the process when it cools and later on, forms a
crystallized patterns. The crystal itself is compressed energy which happens on every
crystal. The process of annealing itself can be further studied for the physical
properties it contains.
22
For an example, physical feature in Geographic Information Systems contains
semantic set of information such as place, related physical object information,
persons, research activity, and many more. Even keywords or thesaurus from
documentations related with the field can be included as one of the physical features.
(Hiebel, Hanke, & Hayek, 2009)
To continue with the PCA itself, Kruger et al (2008) state that one of the main
goals is to achieve generalization and removing redundancies in data by utilizing a
nonmetric scaling which contains nonlinear optimization problem and then
reconstructing the original variables, which is known as mapping and de-mapping
stage:
demapping
D= Pt= ᇩᇭ
࢓ ᇣᇤ
ሺᇪ
࢓࢓ᇭࢠᇫሻ .
ᇥ
mapping
which includes how the score variables (mapping area) and
(demapping area) were
defined. Kramer (as cited in Kruger, Zhang, &Xie’s studies, 2008) also mention that
the whole process of mapping and de-mapping itself are defined through
Autoassociative Neural Network structure which defines the mapping and demapping
stages, separated by neural network layers.
However, Tan and Mavrovouniotis(as cited in Kruger, Zhang, &Xie’s studies,
2008) discovers that it might be difficult to train up to 5 layers neural topology of
autoassociative neural network, because for one, the challenge to measure the
network weights if there were additional number of layers.
23
And to help reducing that complexity, Tan and Mavrovouniotis(as cited in
Kruger, Zhang, &Xie’s studies, 2008) proposed an input training (IT) based on
network topology, this helps omits the layer of mapping until only about 3 layer
networks, and thus established the IT network by obtaining the reduced set of
nonlinear principal components which came as parts of training procedures.
There were many more approaches to help reduce the complexity of layers of
neural topology to make the process became easier to handle. The recent ones that has
been proposed was KernelPCA (KPCA) by Schölkopf(as cited in Kruger, Zhang,
&Xie’s studies, 2008), which works by mapping the original variable sets into a
higher dimensional feature space and then performing conventional linear PCA on it.
This techniques does helps because its simplicity and efficiency, and recently has
been used into much wider range of tools such as face recognition, image-de-noising,
as well as any other fault detection prototype.
2. 3. Normalization
Normalization is an important step in terms of fundamental building block of
data mining, as part of Extraction, Transformation, and Loading (ETL). The goal is to
modify the data source so that it became easier for data mining application as well as
enhancing the effectiveness and performance of the mining algorithms (Venki, 2009).
There are many kinds of Normalization algorithms, and one of the techniques
is calles Min Max Normalization:
24
This algorithms transform value A into new value B, defined within the range of C
and D.
For example, if the salary value is 50000, and needs to be transformed within
the range of 0 and 1, knowing the minimum and maximum of salary between 25000
and 50000, the new normalization value will be:
However, this technique has a weakness that if the minimum and/or maximum
value cannot be defined, then it is difficult to implement this kind of algorithm.
2. 4. Chi-Square Test
Chi-Square test the significance of data population and is a useful tool for
determining whether or not interpreting contingency tables are worth of the research
efforts (Stockburger, 2011). The end result should be whether the cell of the
contingency table should be interpreted if it is significant. On the contrary, if it is not,
then it means no effects were discovered and become useless. The formula being
represented by:
25
Where O is the observed values and E is the expected values.
The distribution itself will be implemented using this formula:
wherev is the shape parameter and Γ is the gamma function, which is based on this
formula:
2. 5. Service Level in Sales
To understand about the service level, it is important to learn the business model from
each of these sales divisions and what they represent. Because of today’s current
market situation, including the rapid changes and strict competitive nature found in
almost every aspect, this has resulted that business role model has to think of a more
strategic (and even more complex) decisions to help them survive.
According to Osterwalder (2004) which refers to the online version of
Cambridge Learner’s Dictionary (Cambridge 2003), the meaning itself was not
defined as a combined meaning. Thus, to quote as separate terms:
26
•
“business: the activity of buying and selling goods and services, or a
particular company that does this, or work you do to earn money” (p. 17).
•
“model: a representation of something, either as a physical object which is
usually smaller than the real object, or as a simple description of the object
which might be used in calculations” (p. 17).
Therefore, Osterwalder (2004) made the combined meaning of business model as
follows:
“Business model is a representation of how a company buys and sells goods and
services and earns money” (p. 17).
The current competitive nature of business these days has made quite a burden
for business model, especially the changes that influence either direct or indirectly.
According to Osterwalder (2004), the changes are:
•
Technological changes, this influenced further with the rapid adaptation of
Internet to either delivers the business needs and also acts as tools for decision
making.
•
Competitive forces, even TV cable industry has several players competes to
attract customers with their line-up products and each of its specialty.
•
Customer demand, which is not only delivers good service, but also following
customer demands based on current trends. One of the examples is securing
football broadcast exclusivity to attract fanatic watchers.
27
•
Social environment, sometimes paying attentions on the social mood can
create new strategies to improve the business model.
•
Legal environment, another important aspect to ensure fair-play between
competitors. One of the examples was the case from Direct Vision when
they’re trying to monopolize the broadcast rights, as explained in the
prosecutor letter published by KPPU (KomisiPengawasPersaingan Usaha,
2008).
With these basic understanding, this study will introduce each of sales
division business model that were selected as representatives of influential sales
performance.
2. 5. 1. Direct Sales
According to World Federation of Direct Selling Associations (2000), “Direct
selling is a dynamic, vibrant, rapidly expanding channel of distribution for the
marketing of products and services directly to consumers” (http://www.wfdsa.org).
Basically, it is a business model that was based on how to create presentation of the
line-up product directly to customers including home delivery and guaranteed
customer satisfaction. This can be done anywhere and depends on the circumstances,
can save costs needed for the current model implementation. The main benefit for this
business model is that the subscription growth increased dramatically. Examples of
this can be found in the article from Twice News, October 2000 edition, where it
states that Hughes Electronics Corp received increase of 3.7% for its DirecTV Sales
in the third-quarter of 1999.
28
2. 5. 2. Market Place
This type of sales division operates within wholesale market. Food and
Agriculture Organization (1991) defines it as “the social institution or mechanism
that forms the linkage between the producer (farmer) and the retailer is the assembly
and wholesale trading system, which enables farmers to sell in small quantities and
purchasing by traders and wholesalers to be made in bulk” (http://www.fao.org).
Being a part of wholesale markets introduces chances of market presentation of the
products that attracts daily customers that comes for their domestic needs.
2. 5. 3. Multi-Level Marketing
The Subscriber Outreach Program sales division was based on the concept of
multi-level marketing. The structure of this model was shaped like a pyramid, thus it
was known as “Pyramid Schemes”. Taken from Federal Trade Commission website
(2007), the model itself was evolving in many forms and rather difficult to be
identified. It does have one common characteristic: “recruiting others to join their
program, not based on profits from any real investment or real sale of goods to the
public” (http://www.ftc.gov).
This approach basically delegates customers as sales representatives itself to
do the presentation of the line-up product and attracting others to invest. That
customer itself will have the benefit of gaining incentives based on total of new
customers they can bring.
29
2. 5. 4. Strategies
There are some strategies that had been developed to measure sales
performance within the company. Even though that the source references comes from
companies with different backgrounds, the strategy that were employed helps to
decide better analysis.
A. Sales Force Automation (SFA)
According to the study of Srinivasan et al (as cited in Boujena, Johnston, and
Merunka’s study, 2009), “SFA technologies enhance performance by
increasing the efficiency and productivity of salespeople and improving both
the quality and quantity of communications among salespersons, the buying
organization, and the selling firm” (p. 2). This can be measured according to
these five main levels that affects the sales function itself:
o Salesperson productivity; to help achieve daily targets. Verity (as cited
in Boujena, Johnston, and Merunka’s study, 2009) mentions that the
reduction of error on manual process and support cost as well as the
improvement in closing rates and average sell prices will benefits in
processing the sales performance and in the ends, helps enhancing the
sales productivity itself.
o Information processing; these ranges from gathering information from
customers and even competitors which will be needed to study and
analyzed.
30
o Communication effectiveness; always an important ability to maintain,
which SFA can help enhance this ability to salespersons.
o Perceived competence; by continuing to learn and adapt to the area
related by the company’s scope of marketing.
o Customer relationship quality; although this was considered to be an
outside category, it proves to be an intangible value between buyers
and sellers. According to Hawes, Mast, and Swan (as cited in Boujena,
Johnston, and Merunka’s study, 2009), there are five factors that helps
in building trusts between buyers and sellers: “customer orientation,
competency, honesty, dependability, and likability” (p. 3).
Based on the research related to SFA technology, Boujena, Johnston, and
Merunka (2009) has pointed out the conclusions based on Table 2. 1 that there
at least 3 meta-categorization process that affects the sales quality:
Professionalism (judged from image, argumentation value, and organizational
value of the salesperson), customer interaction frequency (essentially about
the critical knowledge of the current market), and Responsiveness (the quality
of interaction between salesperson and customer).
31
Table 2.1. Generated Meta-Thematic Categories (Boujena, etc., 2009)
B. Customer Relationship Management (CRM)
While the above paragraphs divulge more on sales quality, this part were
concentrating on the customer relationship which has become a major factor
that attributes to the sales performance itself. Day et al (as cited in Dickson,
Lassar, Hunter, and Chakravorti, 2009) pointed out an important fact: that the
study regarding the success and failures of CRM were heavily based on
“business process management that includes skilled selection, deployment,
configuration, and implementation of CRM best practice processes” (p. 1).
Therefore, the process thinking and implementation skills on each employee
become a paramount importance especially at senior level management. Thus,
32
this has become a difficult aspect that needs to be audited over and over again
to reach the satisfaction level in CRM. Quoted from Dickson, Lassar, Hunter,
and Chakravorti (2009), there are several propositions they made based on
key aspects of CRM itself:
o Added-value process competitive advantage; this includes superior
processes and the integration which will help manage and tackles all
of customer’s important values altogether.
o Search and operational routines; includes research, development, and
market learning routines.
o Evolutionary hot spots; this solely talks about the concept of life
evolution that can be used to learn how to survive the competitions by
learning important facts that the others lack.
o Manager and employee process thinking skills; the needs of endless
pursue for improving thinking skills by assessing the best quality from
each individuals.
o The measurement of process thinking skills; even includes the usage
of information technology which helps adopting best practices.
In the end, the company perception on which practices they’re going to use
will affects the quality of CRM itself. The study on each practices even vary
on each aspects as shown on these two figures below:
33
Figure 2.6. Process Thinking and Learning Hierarchy (Dickson, etc., 2009)
Figure 2.7. The Breadth and Depth of CRM Process Organization (Dickson,
etc., 2009)
This shows that the current learning process are mostly out of synchronization,
while the CRM process helps integrates those process seamlessly.
2. 6. Study Motivation
As stated by some of the examples above and how the results affect the
strategies needs to be delivered, it is important to study as how far data mining
34
techniques can produce clusters than identifies which is and which is not the area that
can be explored further. And by these results that the strategies and decision can be
made more reliable behind mountains of data that does not mean anything at first.
Another example taken from Berry &LinOff (2004) are clustering
implementation on a large bank that wanted to increase the sales of the home equity
loans. The demographics gathered for this process takes around 5000 of each
customer that either has the loans and has not. At first, the parameters that were
analyzed includes tenant appraisal, credits that available and granted, age, marital
status including number of spouse, and the household incomes. The results of these
analyses were drawn into a recognizable chart as follows.
35
Figure 2. 8. The Centers of Five Clusters Compared on the Same Graph. This Simple
Visualization Technique (called Parallel Coordinates) Helps Identify Interesting
Clusters. (Berry &LinOff, 2004)
Unfortunately, the marketing campaign created from these result didn’t
deliver the satisfying results. That doesn’t mean that the clustering had failed, the
problems might lies further within other parameters that are much more specific,
since the preset parameters used earlier were too general. By adding some specific
parameters like deposits system and/or credit card information, the generated result of
the cluster may vary and delivers much more accurately.
While data related to sales are usually well-defined, further physical feature
analysis can reveal more information behind it. For example, in Customer Relation
36
Management, Sales leads can be defined even from unlikely sources such as delivery
personnel, installation process, or public-relation services (www.insidecrm.com,
2010).
Based on research of Amiri and Fathian (2007), applications of Artificial
Neural Networks (ANN) which are the base of SOM cluster method can be used for
marketing segmentation, for examples are retail sales forecasting, direct marketing,
and even target marketing. The results usually represents networks of three different
dimensions of data such as demographic information (sex, age, marriage), economic
information (salary, incomes), and geographic information (states, cities, civilization
levels). Kuo et al (2002) proposed the techniques to help market segmentations using
the combinations of SOM and K-Means using a modified two-stage method:
37
1. Using SOM to determine the number of clusters and starting points.
Figure 2. 9.SOM learning process (Amiri&Fathians, 2007)
38
2. Further cluster analysis that includes second cluster method, K-Means.
Figure 2.10.Performance evalutation of previous cluster research method
(Amiri&Fathians, 2007)