Download Using Self-Organizing Maps and K

C. H. Lien et al. / Asian Journal of Management and Humanity Sciences, Vol. 1, No. 1, pp. 1-15, 2006 Capturing and Evaluating Segments: Using Self-Organizing Maps and K-Means in Market Segmentation CHE-HUI LIEN1,*, ALEX RAMIREZ2, AND GEORGE H. HAINES2 1 Department of Management, Thompson Rivers University, Canada 2 Eric Sprott School of Business, Carleton University, Canada ABSTRACT Ma r ke ts e g me nt a t i o ni sav i t a lp a r to fa no r g a n i z a t i o n’ sma r ke t i ngbe c a u s ei tprovides the fundamental framework necessary for effective marketing efforts. In recent years, due to their high performance in engineering, artificial neural networks have also been applied in management research. Self-organizing maps, a technique of unsupervised neural networks, are often used for clustering or dimensional reduction. This study employs a modified two-stage approach (SOMs and K-means) to group customers, compares the performance between the tandem approach and direct K-means clustering, and tests for the existence of clusters and segments. The test results show that a media promotion variable would be a basis for segmentation. Based on the segmenting results, a marketing c o mmu ni c a t i o ns t r a t e g yi spr e s e nt e dt oc o pewi t hc us t o me r s ’e x pe c t a t i o ns. Key words: market segmentation, cluster analysis, data mining, neural networks, self-organizing maps. 1. INTRODUCTION Since the pioneering research of Wendell Smith (1956), the concept of market segmentation has been one of the most pervasive in both the marketing academic l i t e r a t u r ea n dpr a c t i c e( Ku o,Ho,& Hu ,2002) .I nt oda y ’ sc ompe t i t i v ema r k e t pl a c e , locating and effectively targeting unique market segments enables a company to understand the wants and needs of its customers. For several decades, statistical cluster analysis has been successfully used in market segmentation (Green & Krieger, 1995). Recently, due to an increase in computer power and a decrease in computer cost, a great deal of interest and effort have been directed towards using neural networks (NNs) in business practice, which were once reserved for multivariate statistical analysis. In marketing, the major application of NNs is on market segmentation. Along with the evolution of data mining techniques, Self-organizing maps (SOMs) have been used in determining clusters and is an alternative approach to statistical clustering techniques (Bigus, 1996; Fish, Barness, & Aiken, 1995; Kuo et al., 2002; Venugopal & Baets, 1994). An SOM provides a mapping from a high-dimensional input data into the lower dimensional output maps. A distinguished feature of the SOM is that it preserves the topology of the input data from the high-dimensional input space onto the output map in such a way that the relative distances between input data are more or less preserved (Garson, 1998; Thanakorn, 2003). Although a number of clustering methods have been presented to solve the market segmentation problem, the importance of testing the validity of clusters is * Corresponding author. E-mail: [email protected]. 1 C. H. Lien et al. / Asian Journal of Management and Humanity Sciences, Vol. 1, No. 1, pp. 1-15, 2006 frequently ignored by marketing researchers (Arnold, 1979; Engelman & Hartigan, 1969; Pilling, Crosby, & Ellen, 1991; Punj & Steward, 1983). If identified clusters are inadequately validated, such clustering results can be considered random, i.e., devoid of a meaningful structure (Pilling et al., 1991; Punj & Steward, 1983). The neglect of the test for clusters could lead to the inclusion of clusters that really do not exist. Segments are defined as groups showing some patterns of similarity and that differ significantly in their response to the relevant marketing variables (Sommers & Barnes, 2003). Clusters identified that differ in some way, such as attitudes or demographics, but not in behavior are not true segments (Massy & Frank, 1965). Kuo et al. (2002) proposed a modified two-stage approach (SOMs and K-means) and, based on simulated data, they found that it is slightly better (lower rate of misclassification) than the traditional statistical clustering method. But Kuo et al. (2002) used a tandem approach, based on a preliminary factor analysis followed by a clustering of rotated, standardized factor scores, in the K-means procedure. Green and Krieger (1995) criticized the tandem approach and they argued that the performance of the tandem approach was not as good as direct K-means clustering. In addition, Kuo et al. (2002) neither tested for the existence of clusters nor evaluated whether their clusters were segments. More empirical studies are needed to support the better performance of the novel two-stage approach. This study uses practical data from the home heating system market and a ppl i e s Ku oe ta l . ’ st wo-stage method (SOMs and K-means) in market segmentation, compares the performance of direct K-means clustering with the tandem approach, and tests for clusters and segments. The remainder of this paper di s c u s s e s SOMs ’a ppl i c a t i on si n ma r k e t segmentation and the test method of clusters and segments, followed by a review of the methodology used in this study. The section after that presents our results. The final section gives a summary and a discussion about the findings of the study. 2. REVIEW OF RELATED LITERATURE 2.1 SOMs in Market Segmentation The general idea of segmentation is to group items that are similar (homogeneous) (Kuo et al., 2002; Smith, 1956). Traditionally, statistical clustering techniques have been the common tools for market segmentation (Green & Krieger, 1995). Punj and Steward (1983) suggested that the integration of a hierarchical a ppr oa c h ,s u c ha sWa r d’ smi n i mum v a r i a n c e ,a l on gwi t han on -hierarchical one, such as the K-means, can provide a better answer than using either a hierarchical or a non-hierarchical method alone. Their approach is called the two-stage approach. The term neural networks arose from artificial intelligence research, which attempted to understand and model brain behavior (Berry & Linoff, 2000; Berson, Smith, & Thearling, 2000). Self-organizing maps are feed-forward, unsupervised neural networks and were developed by Kohonen (Bigus, 1996; Kohonen, 2001). Feed-forward networks are used in situations where we can bring all of the 2 C. H. Lien et al. / Asian Journal of Management and Humanity Sciences, Vol. 1, No. 1, pp. 1-15, 2006 information to bear on a problem at once, and we can present it to the neural networks (Bigus, 1996). Unsupervised learning means that clustering proceeds without knowing (a priori) the number of clusters of the input data, and the network organizes and learns the input data unsupervised (Bigus, 1996). The SOM consists of two layers of processing units, an input layer fully connected to an output layer. There is no hidden layer. Vanugopal and Baets (1994) argued that unsupervised neural networks, including SOMs and the adaptive resonance theory, could be used in determining clusters. Their network for segmentation was conceptually developed. Fish et al. (1995) also suggested that unsupervised neural networks could be utilized for market segmentation. In the housing market, Kauko, Hoomineijer, & Hakfoort (2002) examined the application of neural networks to housing market segmentation in Helsinki, Finland. Their study shows how it is possible to identify various dimensions of housing segments by uncovering patterns in the data set as well as the classification ability of SOM-LVQ (learning vector quantization). Kuo e ta l( 2002 ) ,ba s e donPu n ja n dSt e wa r d’ s( 1983)r e s e a r c h ,pr opos e damodi f i e d two-stage approach, which first used self-organizing maps to determine the number of clusters and then employed the K-means method to find the final solution. Kuo et al. (2002) used simulated data and found that their proposed two-stage approach outperformed the conventional two-stage method. The main reason was that the first stage of the conventional two-stage clustering method always involved the hierarchical methods. One of their shortcomings was the aspect of non-recovery: once an observation has been assigned to a cluster, it should not be moved at all (Kuo et al., 2002). However, SOMs are a kind of learning algorithm, which can c on t i nu a l l yu pda t e ,orr e a s s i gnt h eobs e r v a t i ont ot h ec l os e s tc l us t e r .I nKu oe ta l . ’ s study, they employed the tandem approach in the K-means procedure and the tandem approach was criticized in that it could distort the original cluster structure (Green & Krieger, 1995). It is necessary to examine the non-tandem approach, such as direct K-means clustering, and to evaluate its performance. 2.2 Test of Clusters and Segments An appropriate test for clusters appears to be one, which takes into account the objective of cluster analysis, i.e., to minimize the within-group variance and maximize the between-group variance (Arnold, 1979). Among the test methods, a method suggested by Arnold (1979) appears to be better than other methods ( Pi l l i n ge ta l . ,1991;Pu n j&St e wa r d,1983) .I nAr n ol d’ st e s t ,t h es i gn i f i c a n c eoft h e cluster solution is tested by comparing the calculated value of the C test statistics3 to the values which would be expected if the data were drawn from either a unimodal or a uniform distribution, i.e., no basis for clusters (Arnold, 1979). There is some consensus in marketing research on what makes a segmentation solution a good one (Bacon, 2002; Wedel & Kamakura, 2000). The important criteria are: identifiability, accessibility, substantiality, stability, responsiveness, 3 C = log (max (|T| / |W|)) where T represents the total scatter matrix, and W the pooled within-group matrix. 3 C. H. Lien et al. / Asian Journal of Management and Humanity Sciences, Vol. 1, No. 1, pp. 1-15, 2006 and actionability (Bacon, 2002; Wedel & Kamakura, 2000). These criteria are in terms of usefulness for managerial decision-making and are not easy to evaluate (Bacon, 2002). They are also independent of the technology used to obtain them (Bacon, 2002). Therefore, they provide a useful set of goals for segmentation solutions developed using neural network techniques (Bacon, 2002). Few studies attempted to assess segmentation solutions with these criteria (Wedel & Kamakura, 2000). In this paper, we employ one criterion –responsiveness (consumers in different segments should behave differently toward marketing programs directed at them) –to evaluate whether clusters are segments. The major reason for choosing responsiveness as the criterion is that the data we used are cross-sectional. Other criteria, e.g., stability, need longitudinal data. Responsiveness is consistent wi t hMa s s ya n dFr a nk’ s( 1965)a r g ument, i.e., different segments should have different promotional or marketing variables elasticities. 3. METHODOLOGY 3.1 Research Sample This study used previously collected data (Ratchford & Haines, 1986) for clustering analysis. Data were collected from a US sample of Market Facts Consumer Mail Panel in January-February 1983. Because weather is freezing in winter, home heating systems are important in some states (e.g., New York State) in the US. In the 1980s, basically there are five types of home heating system: gas, oil, electric, heat pump and solar. The families in the US can choose one of them to pr odu c eh e a ti nt h eh ous e .I n Ra t c h f or d & Ha i n e s ’ sr e s e a r c h ,t h e yi de n t i f i e d fourteen perceptual attributes as the criteria for rating the five types of home heating systems. They also run factor analysis among the fourteen attributes and found that the resulting three factors (cleanliness, reliability, and cost) explaining approximately 60% of the variation between attributes. However, they did not perform segmentation among the consumers. We focus on consumers planning on buying a house within the next five years. This research defines the respondents without planning to replace home heating systems in the next five years, but planning on buying a house within the next five years as the potential market. When buying a new house, one type of home heating system must be considered in the purchase. The subjects included both males and females, above the age of 18. 433 responses were obtained for the potential market. 3.2 Research Hypotheses 3.2.1 Test of clusters According to the procedure for testing for clusters described by Arnold (1979), there are two sets in the procedure. The first tests the null hypothesis that the population entities tend to concentrate at one point (i.e., one normally distributed) against the alternative hypothesis that its entities are either uniformly distributed or grouped into clusters. If the null hypothesis is rejected at some level of confidence, 4 C. H. Lien et al. / Asian Journal of Management and Humanity Sciences, Vol. 1, No. 1, pp. 1-15, 2006 a second test is made. The second tests the null hypothesis that the population entities are uniformly distributed against the alternative hypothesis that they form two or more clusters. The two null hypotheses are: H1: the population entities tend to concentrate at one point; H2: the population entities are uniformly distributed. Rejection of H1 and H2 indicates that there are clusters in the population. 3.2.2 Test of segments The null hypothesis is: H3: the marketing variables elasticities among clusters are the same. Rejection of H3 means the clusters are segments. 3.3 Questionnaire Young, Ott, and Feigin (1978) argued that segmentation based on benefits desired is usually the most meaningful type to use from a marketing standpoint as it directly facilitates product planning, positioning, and advertising communication. The main strength of benefit segmentation is that the benefits sought by customers will lead to a causal relationship to future purchase behaviour (Minhas & Jacobs, 1996). This study uses benefit variables to group customers. In Ratchford and Ha i n e s ’ sr e s e a r c h(1986), the questionnaire solicited information on five types of home heating systems and each system type was rated for each of the fourteen attributes (benefit variables, see Table 1). Table 1. Fourteen benefit variables Fourteen Benefit Variables Reliability against breakdown Future availability of fuel supply Floor space required for the system Efficiency of converting fuel source into heat Cleanliness of operating the system Ease of conversion to another system Absence of fumes and odors Absence of pollution Availability of professional servicing Noise-free operation of the system Safety of the system Initial purchase price Warranty protection Annual operating cost Purchase intentions were measured by the fol l owi ngqu e s t i on :“ I fy ouwe r e buy i ngah e a t i n gs y s t e mf ory ou rh ome ,pl e a s e‘ x’t h eon et y pet h a ty ouwou l dbe mos tl i k e l yt opu r c h a s e . ”Re s pon s e swe r el i mi t e dt ot h ef i v es y s t e mt y pe s .Ot h e r questions involve the knowledge of heating system, rating all the methods of central heating, marking the preference for one method of home heating over the other, media, demographic information, etc. The measurement scale in the questionnaire is 7-point scale, ranged from not very important (=1) to very important (=7). 5 C. H. Lien et al. / Asian Journal of Management and Humanity Sciences, Vol. 1, No. 1, pp. 1-15, 2006 3.4 Generating Clusters: Using SOMs The Kohonen algorithm can be summarized in the following steps (Bigus, 1996; Garson, 1998; Kohonen, 2001) and the self-organizing maps parameters used in our study are listed in Table 2. (1) Neuron weights are initialized to random values. (2) Data representation. When clustering data with neural networks, it is standard practice to scale (or normalize) the input data to a range of zero to one (Bigus, 1996; Garson, 1998). This study uses the sigmoid function4 to transfer input data to a range of zero to one. (3) Setting the learning rate. The learning rate is used to control the step size in the adjustment of the connection weights. This study sets the learning rate as 0.1. The learning rate will decrease over time. (4) Setting the neighborhood. We set a 10-by-10 output layer. The neighborhood is 10 in this study. (5) Presetting the number of training epochs (training cycle). An epoch is the number of training data sets presented to the model between updates of neural weights (Garson, 1998). The purpose of training is that neural networks have the ability to learn and recognize any important pattern or relationship for a set of data (Nguyen & Ramirez, 1998). So far, there is no rule for determining the number of learning epochs. The only way is trial and error (Kuo et al., 2002). In our research, the training epochs for potential market are 80. At the 80th epoch, the network converges to an average total minimum distance. (6) Data points are input to the net, selected at random. (7) Determine which neuron is the least distant from the presented observations. This is the neuron whose weight vector is closest to the input vector from the current observations, measured in Euclidean distance. (8) Weights of neurons in the neighborhood of the winning neuron are adjusted in value to become closer to the value of the winner. The neighbourhood starts out widely defined but decreases spatially as learning iterations proceed, eventually reaching zero (that is, only the weights of the winning neuron are adjusted). When the neighbourhood drops to one, the convergence phase begins. In reviewing the SOM literature (Berry & Linoff, 2000; Bigus, 1996; Kuo et al., 2002), usually 70% of the population used for training, and 30% of the population used for testing were suggested. In the 433 observations of potential market, a sample size of 300 is used for training and the remaining 133 observations are used for testing (see Table 2). 4 The sigmoid function is defined as a strictly increasing function that exhibits smoothness and asymptotic properties. One popular sigmoid function is: f(x) = 1/[1+ exp (-x)]. 6 C. H. Lien et al. / Asian Journal of Management and Humanity Sciences, Vol. 1, No. 1, pp. 1-15, 2006 Table 2. The self-organizing maps parameters The self-organizing maps parameters [ 0] Number of Input Unit (<20): 14 [ 1] Number of Rows (=Columns) of Output Unit (about 4-10, <20): 10 [ 2] Number of Train Examples: 300 [ 3] Number of Test Examples: 133 [ 4] Number of Train Cycles (about 20-200): 80 [ 5] Number of Test Period (about 1-10): 1 [ 6] Using Batch Learn (Yes=1,No=0, usually 0): 0 [ 7] Using Learned Weights (Yes=1,No=0, usually 0): 0 [ 8] Range of Weights (0.1-0.5, usually 0.3): 3.000e-01 [ 9] Random Seed (0.1-0.9, usually 0.456): 4.560e-01 [10] Learn Rate (0.01-0.3, usually 0.1): 1.000e-01 [11] Learn Rate Reduced Factor (0.9-1.0, usually 0.95): 9.500e-01 [12] Learn Rate Minimum Bound (0.01-0.1, usually 0.01): 1.000e-02 [13] Radius (about = Number of Rows): 1.000e+01 [14] Radius Reduced Factor (0.9-1.0, usually 0.95): 9.500e-01 [15] Radius Minimum Bound (about 0.1): 1.000e-01 This study uses three different software packages – PCNeuron, SAS Enterprise, and Neuroshell to implement clustering. When we set the same parameters, such as the learning rate = 0.1, to implement SOMs in the three software packages, the visualization clustering results are consistent among the three software packages –three clusters in the potential market. Although the aggregate results –in terms of identifying the number of possible clusters –are the same, the same individual may not always be assigned to the same cluster. After running SOMs, the number of clusters is used to implement the K-means algorithm. 4. RESULTS 4.1 Self-Organizing Maps Through the implementation of SOMs by the PCNeuron, the SOMs clustering result is consistent between training and testing samples, showing that there are three clusters in the potential market (see Figure 1). 4.2 K-means Analysis From the previous section, the number of clusters (= 3) is used to implement direct K-means clustering. As mentioned in the questionnaire, the original 14 benefit variables are directly used as input variables and the clustering results show that there are 143 members in cluster 1, 273 members in cluster 2, and 17 members in cluster 3. The members in cluster 1 concern more in the reliability and safety of the home heating system (means (centroids): reliability against breakdown = 6.13, safety of the system = 5.92, warranty protection = 5.16, floor space required for the 7 C. H. Lien et al. / Asian Journal of Management and Humanity Sciences, Vol. 1, No. 1, pp. 1-15, 2006 system = 4.48, availability of professional servicing = 4.64, future availability of fuel supply = 4.2, ease of conversion to another system = 4.43), the members in cluster 2 put much attention to price (initial purchase price= 6.76, annual operating cost = 6.59, efficiency of converting fuel source into heat = 6.5), and the members in cluster 3 care about the cleanliness and quietness of the home heating system (cleanliness of operating the system = 6.74, absence of pollution =6.43, absence of fumes and odors = 6.83, noise-free operation of the system = 5.57). A Wi l k’ sl a mbdav a l u ei sus e dt oc ompa r et h epe r f or ma n c eofc l u s t e r i n g . Wi l k ’ sLa mbdav a l u ei st h er a t i ooft h ewi t h i n-group variance (SSwithin ) to the total variance (SStotal ) (Stevens, 2002). The total variance is the sum of the between-group variance and the within-g r ou pv a r i a n c e .A Wi l k ’ sl a mbdav a l u e closest to zero implies that the source of total variance is from the between-group variance instead of from the within-group v a r i a n c e .Th es ma l l e rt h eWi l k’ sl a mbda v a l u e ,t h ebe t t e rt h ec l u s t e r i ngr e s u l t .Th ec a l c u l a t e dWi l k’ sl a mbdav a l u e sa r e : Wi l k ’ sλ =SSwithin / SStotal = 19.31/ 1691.182 = 0.0114 (direct K-means); Wi l k ’ sλ =SSwithin / SStotal = 2.904/ 131.378 =0.0221 (tandem approach). Because 0.0114 < 0.0221, the performance of the direct K-means clustering is better than the performance of the tandem approach. 4-3 Test of Clusters The best fitting functions for the unimodal and uniform distribution (Arnold, 1979) are as follows: Cunimodal = e0.06239 g0.95242 p0.21011 α0.13389 / N0.29723 = 0.6379; Cuniform = e0.08664 g0.89510 p0.13896 α0.16356 / N0.23672 = 0.6859, where g = the number of clusters = 3; p = the number of attributes = 14; N = the n umbe rofe n t i t i e s=433;α=the level of significance = 0.1. Since the calculated value of test statistics C = log (1691.182/19.31) = 1.942, which exceeds the critical value for Cunimodal and Cuniform, both null hypotheses (H1 and H2) are rejected. Therefore, this study can conclude that the clusters exist in the potential market. 4-4 Test of Segments Because the study wants to estimate marketing variables (price, media promotion, and total knowledge of home heating systems) elasticities, all variables (dependent and independent) will be transformed to logarithms before estimation so that the regression function can be written as a classical linear regression. Since the variables are logarithms, it is possible to interpret the regression coefficients as elasticities. The Chow test (Doran, 1989) can be used to test the equality of regression coefficients across two or more sets. If the test is significant, the null hypothesis (H3) will be rejected, and we can conclude that the clusters are segments. 8 C. H. Lien et al. / Asian Journal of Management and Humanity Sciences, Vol. 1, No. 1, pp. 1-15, 2006 30 Series1 Series2 25 Series3 20 Series4 Series5 15 Series6 10 Series7 5 S9 S5 0 1 3 S1 5 7 Series8 Series9 Series10 9 (a) Series1 15 Series2 Series3 10 Series4 Series5 Series6 5 Series7 S7 0 S1 1 4 7 Series8 Series9 Series10 10 (b) Figure 1. Visualization of clusters in the potential market. (a) Visualization of clusters for training samples in the potential market. (b) Visualization of clusters for testing samples in the potential market. 9 C. H. Lien et al. / Asian Journal of Management and Humanity Sciences, Vol. 1, No. 1, pp. 1-15, 2006 The study uses purchase intention as dependent variable and price, media promotion, total knowledge of the home heating systems as independent variables. In price variables, there are ten variables: initial purchase price of oil, electric, solar, gas, heat pump and annual operating costs of oil, electric, solar, gas, heat pump. As per media promotion variable, it is defined as where customers see or hear anything about heat pumps as a method of home heating systems from media (radio, television, newspaper, magazine, and flyer or handbill) in the past 6 months. Customers can choose all tools that apply to see or hear heat pumps. As per total knowledge of the home heating systems, it stands for the total score of the knowledge of oil, electric, solar, gas, and heat pump systems. After deleting observations with missing information regarding home heating system questions in the potential market, there are 130 valid members in cluster 1, 248 valid members in cluster 2, and 11 valid members in cluster 3. Through the stepwise or forward selection procedures, 6 independent variables (total knowledge of the home heating systems, initial purchase price of heat pump, annual operating costs of electric, annual operating costs of solar, annual operating costs of gas, media promotion) are identified as good predictors. The backward selection procedure identifies 7 predictors. On the grounds of parsimony (Stevens, 2002), we might prefer the 6 predictors selected by the stepwise or forward procedure, especially because the adjusted R2 for the stepwise and backward procedure are quite close (0.535 and 0.539). This model also yields a significant F-test of overall model fit (see Table 3). Table 3. Results of multiple-regression in the potential market Dependent Variable: Purchase Intention Independent Variables Intercept Total knowledge of the home heating systems Initial purchase price of heat pump Annual operating costs of electric Annual operating costs of solar Annual operating costs of gas Media promotion F-test Adjusted R2 Coefficient 1.324 0.952 -.174 .209 .184 -.408 .248 17.819* 0.535 Sig. .000 .000 .040 .002 .004 .000 .000 Note. * P< 0.05 The ANOVA results of multiple-regression in the potential market are shown in Table 4. From Table 4, the F-ratio (Chow test) is: F= [37.474 (12.615 23.431 0.173] /(6 1) = 1.845 (12.615 23.431 0.173) /[130 248 11 2(6 1)] F = 1.845 < F(7, 375, 0.95) =3. 2 4 5( s e tα=0. 05) . 10 C. H. Lien et al. / Asian Journal of Management and Humanity Sciences, Vol. 1, No. 1, pp. 1-15, 2006 The F statistics shows that the test is not significant and we can conclude that the clusters are not segments. Table 4. ANOVA of multiple-regression Source Cluster 1 Regression Residual Total Cluster 2 Regression Residual Total Cluster 3 Regression Residual Total Entire Data Set Regression Residual Total Sum of Squares DF Mean Squares F-test 3.829 12.615 16.444 6 123 129 0.638 0.103 6.2* 6.895 23.431 30.325 6 241 247 1.149 0.097 11.84* 0.829 0.173 1.001 6 4 10 0.138 0.043 3.2 10.523 37.474 47.997 6 382 388 1.754 0.098 17.9* Note. * P< 0.05 For the potential market, only 11 valid members in cluster 3 and the sample size is very small. This raises a natural question about the power of the test. Therefore, the study further compares cluster 1 and cluster 2, and finds the Chow test is still not significant (F = 0.949 < F(7, 364, 0.95) = 3.25). Because the Chow test tests all the variables together, the effects of the variables that do have the same c oe f f i c i e n t sc ou l dpos s i bl y“ s wa mp”t h ee f f e c t sofonl yon eort wooft h ev a r i a bl e s with the different coefficients. If such differences did exist, variables found with different coefficients would be the basis for segmentation. The media promotion variable is found with a significant t-test result (t = 2.09 > t (376, 0.95) =1.645) (the regression coefficients are different between cluster 1 and cluster 2), and it would be a basis for segmentation. According to the definition of media promotion variable described in this section, this study calculates the frequencies of radio, television, newspaper, magazine, and flyer or handbill chosen by customers (Table 5) and we find that in cluster 1, most customers see or hear about heat pumps as a method of the home heating system from radio (35.3%), newspaper (45.4%), and magazine (51.5%). In cluster 2, most customers see or hear about heat pumps from newspaper (41.1%) and magazine (50.4%). 11 C. H. Lien et al. / Asian Journal of Management and Humanity Sciences, Vol. 1, No. 1, pp. 1-15, 2006 Table 5. Frequencies of radio, TV, newspaper, magazine, and flyer in Cluster 1 and Cluster 2 Cluster 1 N % Cluster 2 N % Radio TV Newspaper Magazine Flyer or Handbill 46 35.3 24 18.5 59 45.4 67 51.5 16 12.3 19 7.6 51 20.6 102 41.1 125 50.4 35 14.1 5. CONCLUSIONS This study employs, for the first time, the direct K-means procedure in the modified two-stage clustering approach (SOMs and K-means). The result reveals that direct K-means clustering outperforms the tandem approach. It could be complementary to Kuo et al. (2002) modified two-stage clustering (they employ the tandem approach in the K-means clustering) and provides a better solution to the problem of market segmentation. In addition, the use of neural networks is not just procedure specific (in terms of results), but program specific. Three different software packages are utilized in our research and the results have been discussed. There does not seem to be much use at the test that ensures groups are really c l u s t e r s .Ou rr e s e a r c he mpl oy sAr n ol d’ sa ppr oa c ht ot e s tt h ec l us t e r sa n da v oi d falling into the trap that we always find clusters. The test at segmentation (the Ch ow t e s t )s h owst h a t“ bl i n dr e l i a n c e ”ont h eCh ow t e s ti sn otag oodi de awh e n seeking to see if segments really exist. Advertising in newspapers and magazines as well as promotion through radio programs would be effective to communicate with customers of segment 1. In segment 2, advertising or promoting mainly in newspapers and magazines could be effective marketing communication with customers of segment 2. A major research limitation is that due to commercial secrecy, our study project did not get any help from Canadian companies offering historical data. The authors use previously collected data from Ratchford and Haines (1986). Because the data were collected in 1983, if the results were to be put to immediate practical use by the home heating system company, it would have to be assumed that customer benefit perception has remained stable over the intervening years. REFERENCES Arnold, S. J. (1979). A Test for Clusters. Journal of Marketing Research, 16(4): 545-551. Bacon, L. D. (2002). Handbook of Data Mining and Knowledge Discovery. Oxford, UK: Oxford University Press. Berry, M. J., & Linoff, G. (2000). Mastering data mining: The art and science of customer relationship management. New York, USA: John Wiley & Sons. 12 C. H. Lien et al. / Asian Journal of Management and Humanity Sciences, Vol. 1, No. 1, pp. 1-15, 2006 Berson, A., Smith, S., & Thearling, K. (2000). Building Data Mining Applications for CRM. New York, USA: McGraw-Hill. Bigus, J. (1996). Data Mining with Neural Networks. New York, USA: McGraw-Hill. Doran, H. (1989). Applied Regression Analysis in Econometrics. New York, USA: Marcel Dekker, Inc. Engelman, L., & Hartigan, J. (1969). Percentage Points of a Test for Clusters. Journal of the American Statistical Association, 64, 1647-1648. Fish, K., Barnes, J., & Aiken, M. (1995). Artificial neural networks: A new methodology for industrialmarket segmentation, Industrial Marketing Management, 24, 431-438. Garson, G. (1998). Neural networks: An introduction guide for social scientists. Thousand Oaks, California, USA: SAGE Publications. Green, P. E., & Krieger, A. M. (1995). Alternative Approaches to Cluster-Based Market Segmentation. Journal of the Market Research Society, 37(3), 221-239. Kauko, T., Hoomineijer, P., & Hakfoort, J. (2002). Capturing housing market segmentation: An alternative approach based on neural networks modeling. Housing Studies, 17(6), 875-894. Kohonen, T. (2001). Self-oorganizing maps (3rd ed.). Springer Series in Information Sciences. Berlin, Germany: Springer-Verlag. Kuo, R. J., Ho, L. M., & Hu, C. M. (2002). Cluster analysis in industrial segmentation through artificial neural networks. Computers and Industrial Engineering, 42, 391-399. Massy, W., & Frank, R. (1965). Short Term Price and Dealing Effects in Selected Market Segments. Journal of Marketing Research, 2(2), 171-185. Minhas R., & Jacobs E. (1996). Benefit segmentation by factor analysis: An improved method of targeting customers for financial services. International Journal of Bank Marketing, 14(3), 3-13. Nguyen D., & Ramirez A. (1998). The emerging position of artificial networks as a prime intelligent technology for strategic decision support system. Information Systems, 19(4), 14-22. Pilling, B., Crosby, L., & Ellen, P. (1991). Using benefit segmentation to influence environmental legislation: A bottle bill application. Journal of Public Policy and Marketing, 10(2), 28-46. Punj, G., & Stewart, D. (1983). Cluster analysis in marketing research: Review and suggestions for application. Journal of Marketing Research, 20(2), 134-148. Ratchford, B., & Haines, G. (1986). A Study of consumer behavior in a product class which contains new technologies. Contemporary Research in Marketing, 1, 403-424. Smith, W. (1956). Product differentiation and market segmentation as alternative marketing strategies. Journal of Marketing, 21(3), 3-8. Sommers, M., & Barnes, J. (2003). Introduction to marketing (10th ed.). New York, USA: McGraw-Hill. Stevens, J. (2002). Applied multivariate statistics for the social science. Mahwah, New Jersey, USA: Lawrence Erlbaum Associates. 13 C. H. Lien et al. / Asian Journal of Management and Humanity Sciences, Vol. 1, No. 1, pp. 1-15, 2006 Thanakorn N. (2003). Data mining applications for self-organizing maps. Unpublished Ph.D. dissertation. Rensselaer Polytechnic Institute. Venugopal, V., & Baets, W. (1994). Neural networks and their applications in marketing management. journal of systems management, 45(9), 16-21. Wedel, M., & Kamakura, W. (2000). Market segmentation: conceptual and methodological foundations (2nd ed.). Norwell, Massachusetts, USA: Kluwer Academic Publishers. Young, S., Ott, L., & Feigin, B. (1978). Some practical consideration in market segmentation. Journal of Marketing Research, 15 (3), 405-413. Che-hui Lien received a B.B.A from National Cheng Kung University, Tainan, Taiwan and an MBA degree on international trade from National Cheng Chi University, Taipei, Taiwan. He holds a Ph.D. in management (marketing) from Carleton University, Ottawa, Canada. He is currently an Assistant Professor in Marketing at Thompson Rivers University in Kamloops, BC, Canada. His research interests are database marketing, services marketing, data mining, relationship marketing, CRM, international marketing and marketing research. Alex Ramirez received a B.Sc. with Honours from ITESM (Instituto Tecnologico y de Estudios Superiores de Monterrey) Monterrey Campus in his native Mexico and a master's degree on industrial engineering and operations research from Syracuse University in New York, U. S. A. He holds a Ph.D. in administration (information systems) from Concordia University, Montreal, Canada. He is currently an Assistant Professor in information systems at the Eric Sprott School of Business in Ottawa, Ontario, Canada. He was a Visiting Researcher at the Westminster School of Business, in London, UK during the academic year 2005-2006. Currently he chairs the Academic Computing Committee and the Curriculum Review Committee. His research interests are information systems foundations; and evaluation and adoption of emerging technologies in organizations: knowledge management systems, business intelligence, data warehousing, data mining, and electronic commerce. 14 C. H. Lien et al. / Asian Journal of Management and Humanity Sciences, Vol. 1, No. 1, pp. 1-15, 2006 George H. Haines graduated from Massachusetts Institute of Technology with an SB, Carnegie Institute of Technology with an M.S. and Ph.D. He is currently a Distinguished Research Professor at Carleton University. His current research interests include the study of how the financial marketplace works for small and medium sized enterprises, with a focus on Canadian problems, and the design of management games for education and training. 15

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download Using Self-Organizing Maps and K