Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
-= CHAPTER II =- LITERATURE REVIEW The study in data mining in general revolves around how to extract information and features contained in data and then analyzed to reveal new findings. Although there are many studies of data mining in Customer Relation Management, there still left much area to be explored and especially into specific area of sales field. In this case, the study will be based on the sales field of cable broadcast subscription. 2. 1. Data Mining Taken from Berry &LinOff (2004); Data mining, previously known as Knowledge Discovery in Databases process (KDD), are process of extracting hidden patterns or relationships between the data itself which usually taken from large volumes of data, either statistically or trained data, for the purpose of discovering general knowledge from groups of data. Data Mining also gives the chance for users to be proactive to notice any trends that might happen within new information, as stated from NASCIO Research Brief (2004) which are common in government data mining activity. This serves as purposes to: 5 6 • Performance improvement. • Identify any anomalies. • Research for scientific information contained within. • Improve managements, including human resources. • Foresee any criminal activities that might happen. • Detecting strange behavior from potential terrorist activity. These are the main task of data mining according to Berry &LinOff (2004): • Classification; which involves examination of features from objects usually in forms of records from database and then allocating them into one of the classes that has been defined that usually forms as new class identifiers. • Estimation; while classification outputs discrete values, estimation will deal with some unknown values that are continuously generated, which can also help classification tasks. Regression and neural networks are examples of estimation tasks. • Prediction; while it has similar meaning with estimation, prediction deals with future behavior as in while estimation deals with present values that were not obvious without data mining process. And thus, this process requires patience since it needs to constantly evaluate temporary relationship between the sources and predicted target outputs. And thus, any techniques of classification and estimation can be utilized and involves samples, predicted outcomes, and the historical data that involves two of them. 7 • Association; or affinity grouping, is to decide which things related to. In sales, this could be explained with examples like the things customer will add in their shopping cart. This will creates cross-selling opportunities that will help designing new interesting packages that gives more appeal. • Clustering; is simply a segmentation of population into subgroups or clusters. • Profiling; an important task to give some sort of identification that explains or describes of where to look based on association and clustering. Decision tree is the most common techniques for this kind of data mining. 2. 1. 1. Clustering Methods According to Han &Kamber (2001), clustering, also known as descriptive modeling is a data mining techniques which divides data into groups, although the knowledge of group identification cannot be known at first and through analyzing of the data patterns will the group, or cluster, can be recognized by its behaviors. At the least, it will define with cluster that are more crowded and those that are sparse. There are many clustering techniques known throughout histories and researches which are still evolve as of today, mostly by improving the algorithm process of that particular technique. But there is also another way to implements clustering, which includes combining of several techniques to produce more varieties of results. This also necessary because one of the clustering techniques can filter out data parameters that are not useful for discovering the cluster itself, even after doing manual preprocessing of the data itself. 8 Han &Kamber(2001) also refers another cluster analysis as unsupervised learning because it does not depend on predefined classes and known labels, it learns by observation. And in conceptual clustering, a class which contains a group of object can only be described by a concept, which differs from conventional clustering where it will measure similarity according with the distances between two components as it discovers the appropriate class and then form descriptions from each class, which is similar with classification (Han &Kamber, 2001). Clustering in data mining has typical requirements according to Han &Kamber (2001): 1. Scalability from either small samples or large database to avoid biased results. 2. Can handle different kind of attributes, which can be mixtures of binary, nominal, and ordinal data. 3. Discovering arbitrary shape in clusters since not all common cluster are spherical. 4. Minimal domain knowledge to help determine parameters that has high dimension 5. Dealing with outliers or noisy data to avoid clusters with poor quality. 6. Insensitivity with the order from input sources for dynamic process. 7. Reducing high dimensionality, since normal humans can study at least up to 3 dimensions. 8. Constraint-based clustering that produces group of data that has good cluster behavior. 9 9. Semantic interpretation and usability which relates to goals that may influence the selection of cluster method. 2. 1. 1. 1.Self-Organizing Maps Self-Organizing Maps (SOM), also known as Kohonen Networks, are variant of neural networks used for undirected data mining tasks such as cluster detection which can recognize unknown patterns in the data (Berry &LinOff, 2004). Berry &LinOff (2004) also states that the simplest analogy in explaining how SOM works is like throwing balls booth at the carnival. Usually a player will try to throw the ball into the holes on the wall. Imagine this time that the wall doesn’t have holes yet. When a player throws a ball for the first time, it creates a dent on it. As the player continues to throw the balls, it creates dents on various places. But as the dents in the vicinity become larger and deeper, soon enough there’s one exact moment when a ball thrown at it, it finally creates a hole. And the next time the player succeeds throwing the ball into that hole, he wins the prize. That hole, in SOM, is called an identifiable cluster.Other similar examples would be exercising at the archery. Couples of beginning shots are usually weak and createrandom spots on the target. But later on, those marks become abundant and the areas around bulls-eye are filled with so many shot marks. In SOM, the more parameters contributed, especially specific ones can change the outcome of the identified cluster which could affect the strategies based on the parameters identified from the proposed cluster that has important parameters. These 10 important parameters have significant distribution value that might affect the quality of the cluster itself (Berry &LinOff, 2004).Figure 2.1 represents the whole idea of how SOM works in clustering area of distributed data collections. The output units compete with each other for the output of the network. The output layer is laid out like a grid. Each unit is connected to all the input units, but not to each other. The input layer is connected to the inputs. - The winning output unit and s path 0.1 0.2 0.2 0.2 0.1 0.6 0.7 0.6 0.8 Figure 2.l.Example ofHow SOM Works in Detecting Clusters by Defining (Upper) and Locating the Intended Cluster (Below). (Berry &LinOff, 2004) 11 According to Vesanto (2002), a training program will be conducted sequentially until it achieves satisfying results based on the learning rate, using these formula: bi = argminj{|| xi – mj (t) ||}. The best-matching unit (BMU) represented by bi, contains the map unit with prototype unit (mj) closest distant measure to sample data vector (xi) within each training steps (t). The prototype vectors will be updated by moving them toward xi using this rule: mj (t + 1) = mj(t) + α(t) hbij(t) [xi – mj (t)], wheret is the training step index, α(t) is learning rate and hbij(t) is a neighborhood kernel centered on the winner unit. The visualization can be seen on Figure 2.2 where BMU and its neighbors were updated toward input sample marked by x. The black and gray dots represent position before and after updating while the solid and dashed lines represent neighborhood relations. 12 Figure 2. 2. SOM training process (Vesanto, 2002) The algorithm used in SOM has the ability to self-learning incrementally based on the regression that performed recursively at each representation of the sample data and the distance factors between each model of the data set itself. And all of these can be performed in batch, resulting in faster algorithms (Kohonen, 2005). Bação, Lobo, &Painho (2005) summarize the whole basic SOM training as follows: 13 X = the set of n training patterns x1, x2,..xn. W = p x q grid of units wijwhere i and j are their coordinates on that grid. α = the learning rate, assuming values in [0,1], initialized to a given initial learning rate. r = the radius of the neighborhood function h(wij,wmn,r),initialized to a given initial radius. 1 Repeat 2 For k=1 to n 3 For all wij W, calculate dij= || xk- wij|| 4 Select the unit that minimizes dijas the winner wwinner 5 Update each unit wij W: wij= wij+ αh(wwinner,wij,r) || xk -wij|| 6 Decrease the value of αand r 7 Until α reaches 0 Feature extraction can also be done with SOM, as Liu, Weisberg, and Mooers (2006) has done by using artificial data representative of known patterns related to their previous research for the ocean currents on West Florida. It extracts the patterns of the linear progressive sine wave. Further research combines the effects of SOM tunable parameters by adding random noise into the progressive wave data. By this improvisation, SOM techniques can successfully extracts separate patterns that associated with transitional patterns with more of the typical weather related. Feature extraction can also be done with SOM, as Liu, Weisberg, and Mooers (2006) has done by using artificial data representative of known patterns related to their previous research for the ocean currents on West Florida. It extracts the patterns of the linear progressive sine wave. Further research combines the effects of SOM tunable parameters by adding random noise into the progressive wave data. By this 14 improvisation, SOM techniques can successfully extracts separate patterns that associated with transitional patterns with more of the typical weather related. 2. 1. 1. 2. K-Means Taken from Maitra, Peterson, &Ghosh (2010), “The K-means clustering algorithm iteratively partitions a dataset into K groups in the vicinity of its initialization such that an objective function defined in terms of the total within-group sum-of-squares is minimized” (p. 1). Thus, this technique helps locating the center of clusters, though the success of this method relies from the starting kernel values of each cluster. To understand the concept itself, the following diagram will show the difference: Figure 2.3. Clustering results using 10 clusters (shown by projection into 2d space) using the sample training image. K-means clustering without applying kernel transformation (left) is compared with kernel k-means clustering (right). (Honarkhah&Caers, 2010) 15 Results from Figure 2.3 were taken from the samples of training images which is converted into grid templates and then measured using Euclidean distance which are well known for its simplicity, which is known as Dissimilarity distance function(Honarkhah&Caers, as cited from Suzuki &Caers, Arpat&Caers, 2006, 2007): ܜ܉ܘ ሺܝሻ and ܜ܉ܘ ሺܝሻ ܂ ܂ are the two patterns from pattern database that will be measured. After that, the calculation will be placed into matrices, which is then calculated using kernel function of K-Means itself: This includes sets of k centers, located at central of data which are the closest. And for this membership function, every data that belong to its nearest center will form a partition of data while separating the distant data, and this should minimize the variance from within clusters(Hamerly&Elkan, 2003). 16 Figure 2.4. K-Means distant and close nodes (Hamerly&Elkan, 2003) However, the most common well-used here is the Gaussian radial basis function (Honarkhah&Caers, 2010): (10) According to Maitra, Peterson, &Ghosh (2010), the performance of K-Means depends the initialization process where on a well-separated groups would mostly performs well under any kinds of cluster scenarios while it is not practical to perform K-Means on a very large datasets. However, Maitra, Peterson, &Ghosh (2010) had tested at least eleven common initialization methods which cover wide variety of initialization strategies and it still does not provide better strategies and further analysis was needed, which proves that it was still a long way ahead until the discovery a much more strategically K-Means clustering method. 17 2. 1. 1. 3. Expectation–Maximization Mixture Model According to Gupta & Chen (2011), Expectation-Maximization (EM) based on Gaussian mixture models (GMM) which is intended for “learning an optimal mixture of fixed models, for estimating the parameters of a compound Dirichlet distribution, and for dis-entangling superimposed signals”. EM also helps estimating both GMM and hidden Markov’s models (HMMs). Gupta & Chen (2011) states that there are several variations of EM: • Generalized EM (GEM) finds maximizes,F( , q) = Q( , that improves, but not necessarily (t)) in the M-step. This is useful when the exactM-step is difficult to carry out. Since this is still coordinate ascent, GEMcan find a local optimum. • Stochastic EM: The E-step is computed with Monte Carlo sampling. This introduces randomness into the optimization, but asymptotically it will converge to a local optimum. • Variational EM: q(Z) is restricted to some easy-to-compute subset of distributions, for example the fully factorized distributions q(Z) = Πi q(zi). 18 True G !viM density 1000 i.i.d. samples 10 ·· M-\.. 10 8 . 6 5 2 0 , -·· .. 4 ·.:."., 2 . it "; . . ...... !;;:"" . . 0 - - ..;t'l!:·•·:.. . 't .. -· · .. •.. t • ••.. -4 -5 . ...J.. •.". : -;-·-- ....... - -6 -8 -5 0 5 10 -10 -10 Initial G uess m = 0, £(0)= - 3.9756 0 5 10 1st EM estimate m = 1, £{1l= - 3.6492 10 10 -5 10 -5 0 5 10 -5 2nd EM estimate m = 2, £(2)= -3.6446 0 5 3rd EM estimate m = 3, £(3)= - 3.6438 10 5 5 0 0 -5 -5 10 19 -10 -I 0 ·5 0 5 I0 -10 -I 0 -5 0 5 Figure 2. 5. GMM fitting examples from EM estimates (Gupta & Chen, 2011) I0 20 However, Gupta & Chen (2011) states that analysis using EM depends on these facts: 1. Convergence. The monotonicity from EM algorithms iterations can be guaranteed as well as the guesses based on their likelihood, but it cannot guarantee the convergence of the sequence because it depends on the characteristics and starting points.Proof of the monotonicity theorem are close related with Markov relationship.EM algorithm for the maximum a posteriori (MAP) also has monotonicity properties. 2. Maximization-Maximization; is a joint maximization procedure that iteratively maximizes a better lower bound to the log-likelihood function. The interpretation establishes EM as belonging for the class of methods called alternating optimization or alternating minimization methods which also includes projection onto convex sets (POCS) and the Blahut-Arimoto algorithms (Stark & Yang, 1998). The algorithms itself are as follows (Zhu, 2007): The defined processes are: 1. Estimate (t=0) from labeled examples only. 2. Repeat until convergence: 21 a. E-step: For i = 1…n, k = 1…K, compute γik = p(yi = k|xi, b. M-step: Compute (t+1) (t) ). from (7). Let t = t + 1. 2. 2. Principal Component Analysis Principal Component Analysis is one of the studies of Multivariate Statistic which is “a data analysis technique that relies on a simple transformation of recorded observation, stored in a vector z אRN, to produce statistically independent score variables, stored in tאRn, n ≤ N: t = P Tz . P is a transformation matrix, constructed from orthonormal column vectors.” (Kruger, Zhang, &Xie, 2008). PCA itself has been used as one of the basis clustering method, especially in data mining, based on variables or physical features it has. Physical feature itself has many definitions that depend on the situation related with the field of research itself, but sufficient to say, physical feature are the physical attributes extracted from set of data (www.faqs.org, 2010). In documents, it is necessary to extract physical feature since it has abstract view at first and does not reveal specific attributes of the current document. Another analogy taken from Berry &LinOff (2004) to describe physical feature is like a form of liquids, and the process when it cools and later on, forms a crystallized patterns. The crystal itself is compressed energy which happens on every crystal. The process of annealing itself can be further studied for the physical properties it contains. 22 For an example, physical feature in Geographic Information Systems contains semantic set of information such as place, related physical object information, persons, research activity, and many more. Even keywords or thesaurus from documentations related with the field can be included as one of the physical features. (Hiebel, Hanke, & Hayek, 2009) To continue with the PCA itself, Kruger et al (2008) state that one of the main goals is to achieve generalization and removing redundancies in data by utilizing a nonmetric scaling which contains nonlinear optimization problem and then reconstructing the original variables, which is known as mapping and de-mapping stage: demapping D= Pt= ᇩᇭ ᇣᇤ ሺᇪ ᇭࢠᇫሻ . ᇥ mapping which includes how the score variables (mapping area) and (demapping area) were defined. Kramer (as cited in Kruger, Zhang, &Xie’s studies, 2008) also mention that the whole process of mapping and de-mapping itself are defined through Autoassociative Neural Network structure which defines the mapping and demapping stages, separated by neural network layers. However, Tan and Mavrovouniotis(as cited in Kruger, Zhang, &Xie’s studies, 2008) discovers that it might be difficult to train up to 5 layers neural topology of autoassociative neural network, because for one, the challenge to measure the network weights if there were additional number of layers. 23 And to help reducing that complexity, Tan and Mavrovouniotis(as cited in Kruger, Zhang, &Xie’s studies, 2008) proposed an input training (IT) based on network topology, this helps omits the layer of mapping until only about 3 layer networks, and thus established the IT network by obtaining the reduced set of nonlinear principal components which came as parts of training procedures. There were many more approaches to help reduce the complexity of layers of neural topology to make the process became easier to handle. The recent ones that has been proposed was KernelPCA (KPCA) by Schölkopf(as cited in Kruger, Zhang, &Xie’s studies, 2008), which works by mapping the original variable sets into a higher dimensional feature space and then performing conventional linear PCA on it. This techniques does helps because its simplicity and efficiency, and recently has been used into much wider range of tools such as face recognition, image-de-noising, as well as any other fault detection prototype. 2. 3. Normalization Normalization is an important step in terms of fundamental building block of data mining, as part of Extraction, Transformation, and Loading (ETL). The goal is to modify the data source so that it became easier for data mining application as well as enhancing the effectiveness and performance of the mining algorithms (Venki, 2009). There are many kinds of Normalization algorithms, and one of the techniques is calles Min Max Normalization: 24 This algorithms transform value A into new value B, defined within the range of C and D. For example, if the salary value is 50000, and needs to be transformed within the range of 0 and 1, knowing the minimum and maximum of salary between 25000 and 50000, the new normalization value will be: However, this technique has a weakness that if the minimum and/or maximum value cannot be defined, then it is difficult to implement this kind of algorithm. 2. 4. Chi-Square Test Chi-Square test the significance of data population and is a useful tool for determining whether or not interpreting contingency tables are worth of the research efforts (Stockburger, 2011). The end result should be whether the cell of the contingency table should be interpreted if it is significant. On the contrary, if it is not, then it means no effects were discovered and become useless. The formula being represented by: 25 Where O is the observed values and E is the expected values. The distribution itself will be implemented using this formula: wherev is the shape parameter and Γ is the gamma function, which is based on this formula: 2. 5. Service Level in Sales To understand about the service level, it is important to learn the business model from each of these sales divisions and what they represent. Because of today’s current market situation, including the rapid changes and strict competitive nature found in almost every aspect, this has resulted that business role model has to think of a more strategic (and even more complex) decisions to help them survive. According to Osterwalder (2004) which refers to the online version of Cambridge Learner’s Dictionary (Cambridge 2003), the meaning itself was not defined as a combined meaning. Thus, to quote as separate terms: 26 • “business: the activity of buying and selling goods and services, or a particular company that does this, or work you do to earn money” (p. 17). • “model: a representation of something, either as a physical object which is usually smaller than the real object, or as a simple description of the object which might be used in calculations” (p. 17). Therefore, Osterwalder (2004) made the combined meaning of business model as follows: “Business model is a representation of how a company buys and sells goods and services and earns money” (p. 17). The current competitive nature of business these days has made quite a burden for business model, especially the changes that influence either direct or indirectly. According to Osterwalder (2004), the changes are: • Technological changes, this influenced further with the rapid adaptation of Internet to either delivers the business needs and also acts as tools for decision making. • Competitive forces, even TV cable industry has several players competes to attract customers with their line-up products and each of its specialty. • Customer demand, which is not only delivers good service, but also following customer demands based on current trends. One of the examples is securing football broadcast exclusivity to attract fanatic watchers. 27 • Social environment, sometimes paying attentions on the social mood can create new strategies to improve the business model. • Legal environment, another important aspect to ensure fair-play between competitors. One of the examples was the case from Direct Vision when they’re trying to monopolize the broadcast rights, as explained in the prosecutor letter published by KPPU (KomisiPengawasPersaingan Usaha, 2008). With these basic understanding, this study will introduce each of sales division business model that were selected as representatives of influential sales performance. 2. 5. 1. Direct Sales According to World Federation of Direct Selling Associations (2000), “Direct selling is a dynamic, vibrant, rapidly expanding channel of distribution for the marketing of products and services directly to consumers” (http://www.wfdsa.org). Basically, it is a business model that was based on how to create presentation of the line-up product directly to customers including home delivery and guaranteed customer satisfaction. This can be done anywhere and depends on the circumstances, can save costs needed for the current model implementation. The main benefit for this business model is that the subscription growth increased dramatically. Examples of this can be found in the article from Twice News, October 2000 edition, where it states that Hughes Electronics Corp received increase of 3.7% for its DirecTV Sales in the third-quarter of 1999. 28 2. 5. 2. Market Place This type of sales division operates within wholesale market. Food and Agriculture Organization (1991) defines it as “the social institution or mechanism that forms the linkage between the producer (farmer) and the retailer is the assembly and wholesale trading system, which enables farmers to sell in small quantities and purchasing by traders and wholesalers to be made in bulk” (http://www.fao.org). Being a part of wholesale markets introduces chances of market presentation of the products that attracts daily customers that comes for their domestic needs. 2. 5. 3. Multi-Level Marketing The Subscriber Outreach Program sales division was based on the concept of multi-level marketing. The structure of this model was shaped like a pyramid, thus it was known as “Pyramid Schemes”. Taken from Federal Trade Commission website (2007), the model itself was evolving in many forms and rather difficult to be identified. It does have one common characteristic: “recruiting others to join their program, not based on profits from any real investment or real sale of goods to the public” (http://www.ftc.gov). This approach basically delegates customers as sales representatives itself to do the presentation of the line-up product and attracting others to invest. That customer itself will have the benefit of gaining incentives based on total of new customers they can bring. 29 2. 5. 4. Strategies There are some strategies that had been developed to measure sales performance within the company. Even though that the source references comes from companies with different backgrounds, the strategy that were employed helps to decide better analysis. A. Sales Force Automation (SFA) According to the study of Srinivasan et al (as cited in Boujena, Johnston, and Merunka’s study, 2009), “SFA technologies enhance performance by increasing the efficiency and productivity of salespeople and improving both the quality and quantity of communications among salespersons, the buying organization, and the selling firm” (p. 2). This can be measured according to these five main levels that affects the sales function itself: o Salesperson productivity; to help achieve daily targets. Verity (as cited in Boujena, Johnston, and Merunka’s study, 2009) mentions that the reduction of error on manual process and support cost as well as the improvement in closing rates and average sell prices will benefits in processing the sales performance and in the ends, helps enhancing the sales productivity itself. o Information processing; these ranges from gathering information from customers and even competitors which will be needed to study and analyzed. 30 o Communication effectiveness; always an important ability to maintain, which SFA can help enhance this ability to salespersons. o Perceived competence; by continuing to learn and adapt to the area related by the company’s scope of marketing. o Customer relationship quality; although this was considered to be an outside category, it proves to be an intangible value between buyers and sellers. According to Hawes, Mast, and Swan (as cited in Boujena, Johnston, and Merunka’s study, 2009), there are five factors that helps in building trusts between buyers and sellers: “customer orientation, competency, honesty, dependability, and likability” (p. 3). Based on the research related to SFA technology, Boujena, Johnston, and Merunka (2009) has pointed out the conclusions based on Table 2. 1 that there at least 3 meta-categorization process that affects the sales quality: Professionalism (judged from image, argumentation value, and organizational value of the salesperson), customer interaction frequency (essentially about the critical knowledge of the current market), and Responsiveness (the quality of interaction between salesperson and customer). 31 Table 2.1. Generated Meta-Thematic Categories (Boujena, etc., 2009) B. Customer Relationship Management (CRM) While the above paragraphs divulge more on sales quality, this part were concentrating on the customer relationship which has become a major factor that attributes to the sales performance itself. Day et al (as cited in Dickson, Lassar, Hunter, and Chakravorti, 2009) pointed out an important fact: that the study regarding the success and failures of CRM were heavily based on “business process management that includes skilled selection, deployment, configuration, and implementation of CRM best practice processes” (p. 1). Therefore, the process thinking and implementation skills on each employee become a paramount importance especially at senior level management. Thus, 32 this has become a difficult aspect that needs to be audited over and over again to reach the satisfaction level in CRM. Quoted from Dickson, Lassar, Hunter, and Chakravorti (2009), there are several propositions they made based on key aspects of CRM itself: o Added-value process competitive advantage; this includes superior processes and the integration which will help manage and tackles all of customer’s important values altogether. o Search and operational routines; includes research, development, and market learning routines. o Evolutionary hot spots; this solely talks about the concept of life evolution that can be used to learn how to survive the competitions by learning important facts that the others lack. o Manager and employee process thinking skills; the needs of endless pursue for improving thinking skills by assessing the best quality from each individuals. o The measurement of process thinking skills; even includes the usage of information technology which helps adopting best practices. In the end, the company perception on which practices they’re going to use will affects the quality of CRM itself. The study on each practices even vary on each aspects as shown on these two figures below: 33 Figure 2.6. Process Thinking and Learning Hierarchy (Dickson, etc., 2009) Figure 2.7. The Breadth and Depth of CRM Process Organization (Dickson, etc., 2009) This shows that the current learning process are mostly out of synchronization, while the CRM process helps integrates those process seamlessly. 2. 6. Study Motivation As stated by some of the examples above and how the results affect the strategies needs to be delivered, it is important to study as how far data mining 34 techniques can produce clusters than identifies which is and which is not the area that can be explored further. And by these results that the strategies and decision can be made more reliable behind mountains of data that does not mean anything at first. Another example taken from Berry &LinOff (2004) are clustering implementation on a large bank that wanted to increase the sales of the home equity loans. The demographics gathered for this process takes around 5000 of each customer that either has the loans and has not. At first, the parameters that were analyzed includes tenant appraisal, credits that available and granted, age, marital status including number of spouse, and the household incomes. The results of these analyses were drawn into a recognizable chart as follows. 35 Figure 2. 8. The Centers of Five Clusters Compared on the Same Graph. This Simple Visualization Technique (called Parallel Coordinates) Helps Identify Interesting Clusters. (Berry &LinOff, 2004) Unfortunately, the marketing campaign created from these result didn’t deliver the satisfying results. That doesn’t mean that the clustering had failed, the problems might lies further within other parameters that are much more specific, since the preset parameters used earlier were too general. By adding some specific parameters like deposits system and/or credit card information, the generated result of the cluster may vary and delivers much more accurately. While data related to sales are usually well-defined, further physical feature analysis can reveal more information behind it. For example, in Customer Relation 36 Management, Sales leads can be defined even from unlikely sources such as delivery personnel, installation process, or public-relation services (www.insidecrm.com, 2010). Based on research of Amiri and Fathian (2007), applications of Artificial Neural Networks (ANN) which are the base of SOM cluster method can be used for marketing segmentation, for examples are retail sales forecasting, direct marketing, and even target marketing. The results usually represents networks of three different dimensions of data such as demographic information (sex, age, marriage), economic information (salary, incomes), and geographic information (states, cities, civilization levels). Kuo et al (2002) proposed the techniques to help market segmentations using the combinations of SOM and K-Means using a modified two-stage method: 37 1. Using SOM to determine the number of clusters and starting points. Figure 2. 9.SOM learning process (Amiri&Fathians, 2007) 38 2. Further cluster analysis that includes second cluster method, K-Means. Figure 2.10.Performance evalutation of previous cluster research method (Amiri&Fathians, 2007)