Survey
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
Database Marketing Cluster Analysis N. Kumar, Asst. Professor of Marketing Agenda Discussion of the first Assignment Motivation for conducting Cluster Analysis Benefit Segmentation Cluster Analysis Basic Concepts Hierarchical/Non- Hierarchical Clustering Implementation in SAS and interpreting the output 2 Voter Profiling What are the different voting segments out there? What do they want to hear i.e. issues they care about? What should I say? N. Kumar, Asst. Professor of Marketing Ad Campaign How many customer segments are there? How many do I want to target? How should I target – what message should I communicate to each segment? N. Kumar, Asst. Professor of Marketing Promotional Strategies Coupon Drops – who should they be targeted at? Catalog Example – should the catalog be accompanied with a $5 coupon or a $10 coupon or no coupon? N. Kumar, Asst. Professor of Marketing What is Cluster Analysis? Cluster Analysis is a technique for combining observations into groups or clusters such that: Each group is homogenous with respect to certain characteristics (that you specify) Each group is different from the other groups with respect to the same characteristics N. Kumar, Asst. Professor of Marketing Data Consumer Income ($ 1000s) Education (years) 1 5 5 2 6 6 3 15 14 4 16 15 5 25 19 6 30 20 N. Kumar, Asst. Professor of Marketing Geometrical View of Cluster Analysis Education Income N. Kumar, Asst. Professor of Marketing Similarity Measures Why are consumers 1 and 2 similar? Distance(1,2) = (5-6)2 + (5-6)2 More generally, if there are p variables: Distance(i,j) = (xik - xjk)2 N. Kumar, Asst. Professor of Marketing Similarity Matrix C1 C2 C3 C4 C5 C6 C1 0 2 181 221 625 821 C2 2 0 145 181 557 745 C3 181 145 0 2 136 250 C4 221 181 2 0 106 212 C5 625 557 136 106 0 26 C6 821 745 250 212 26 0 N. Kumar, Asst. Professor of Marketing Clustering Techniques Hierarchical Clustering Non-Hierarchical Clustering N. Kumar, Asst. Professor of Marketing Hierarchical Clustering Distance(1,2) = 2 = Distance(3,4) Say, we group 1 and 2 together and leave the others as is How do we compute the distance between a group that has two (or more) members and the others? N. Kumar, Asst. Professor of Marketing Hierarchical Clustering Algorithms Centroid Method Nearest-Neighbor or Single-Linkage Farthest-Neighbor or Complete-Linkage Average-Linkage Ward’s Method N. Kumar, Asst. Professor of Marketing Centroid Method Each group is replaced by an average consumer Cluster 1 – average income = 5.5 and average education = 5.5 N. Kumar, Asst. Professor of Marketing Data for Five Clusters Cluster Members Income Education 1 C1&C2 5.5 5.5 2 C3 15 14 3 C4 16 15 4 C5 25 20 5 C6 30 19 N. Kumar, Asst. Professor of Marketing Similarity Matrix C1&C2 C3 C4 C5 C6 C1&C2 0 C3 162.5 0 C4 200.5 2 C5 590.5 135.96 106 0 C6 782.5 250 26 0 212 N. Kumar, Asst. Professor of Marketing 0 Data for Four Clusters Cluster Members Income Education 1 C1&C2 5.5 5.5 2 C3&C4 15.5 14.5 3 C5 25 20 4 C6 30 19 N. Kumar, Asst. Professor of Marketing Similarity Matrix C1&C2 C3&C4 C5 C1&C2 0 C3&C4 181 0 C5 590 120.5 0 C6 782.5 230.5 26 N. Kumar, Asst. Professor of Marketing C6 0 Data for Three Clusters Cluster Members Income Education 1 C1&C2 5.5 5.5 2 C3&C4 15.5 14.5 3 C5&C6 27.5 19.5 N. Kumar, Asst. Professor of Marketing Similarity Matrix C1&C2 C3&C4 C1&C2 0 C3&C4 181 0 C5&C6 680 169 N. Kumar, Asst. Professor of Marketing C5&C6 0 Dendogram for the Data C1 C2 C3 C4 C5 N. Kumar, Asst. Professor of Marketing C6 Single Linkage First Cluster is formed in the same fashion Distance between Cluster 1 comprising of customers 1 and 2 and customer 3 is the minimum of Distance(1,3) = 181 and Distance(2,3) = 145 N. Kumar, Asst. Professor of Marketing Similarity Matrix C1&C2 C3 C4 C5 C6 C1&C2 0 C3 145 0 C4 181 2 0 C5 557 136 106 0 C6 745 250 212 26 N. Kumar, Asst. Professor of Marketing 0 Complete Linkage Distance between Cluster 1 comprising of customers 1 and 2 and customer 3 is the maximum of Distance(1,3) = 181 and Distance(2,3) = 145 N. Kumar, Asst. Professor of Marketing Similarity Matrix C1&C2 C3 C4 C5 C6 C1&C2 0 C3 181 0 C4 221 2 0 C5 625 136 106 0 C6 821 250 212 26 N. Kumar, Asst. Professor of Marketing 0 Average Linkage Distance between Cluster 1 comprising of customers 1 and 2 and customer 3 is the average of Distance(1,3) = 181 and Distance(2,3) = 145 N. Kumar, Asst. Professor of Marketing Similarity Matrix C1&C2 C3 C4 C5 C6 C1&C2 0 C3 163 0 C4 201 2 0 C5 591 136 106 0 C6 783 250 212 26 N. Kumar, Asst. Professor of Marketing 0 Ward’s Method Does not compute distance between clusters Forms clusters by maximizing withincluster homogeneity or minimizing error sum of squares (ESS) ESS for cluster with two observations (say, C1 and C2) = (5-5.5)2 + (6-5.5)2 + (5-5.5)2 + (6-5.5)2 N. Kumar, Asst. Professor of Marketing Ward’s Method 1 2 3 4 5 6 7 CL1 C1,C2 C1,C3 C1,C4 C1,C5 C1,C6 C2,C3 C2,C4 CL2 C3 C2 C2 C2 C2 C1 C1 CL3 C4 C4 C3 C3 C3 C4 C3 CL4 C5 C5 C5 C4 C4 C5 C5 N. Kumar, Asst. Professor of Marketing CL5 C6 C6 C6 C6 C5 C6 C6 ESS 1 90.5 110.5 312.5 410.5 72.5 90.5 Non-Hierarchical Clustering Data are grouped into K clusters Requires a priori knowledge of K N. Kumar, Asst. Professor of Marketing Basic Steps in Non-Hierarchical Clustering Select K initial cluster centroids Assign each observation to the cluster to which it is closest Reassign or reallocate each observation to one of the K clusters according to a pre-determined stopping rule Stop if there is no reallocation Approaches differ in Step 1 and/or step 3 N. Kumar, Asst. Professor of Marketing Algorithm I Selects first K observations as cluster centers N. Kumar, Asst. Professor of Marketing Initial Cluster Centroids Variable CL1 CL2 CL3 Income 5 6 15 Education 5 6 14 N. Kumar, Asst. Professor of Marketing Initial Assignment C1 Distance Distance Distance Assigned from C1 from C2 from C3 to CL 0 2 181 1 C2 2 0 145 2 C3 181 145 0 3 C4 221 181 2 3 C5 625 557 136 3 C6 821 745 250 3 N. Kumar, Asst. Professor of Marketing New Cluster Centroids Variable CL1 CL2 CL3 Income 5 6 21.5 Education 5 6 17 N. Kumar, Asst. Professor of Marketing Distance Matrix Distance from CL1 Distance from CL2 Distance from CL3 Previous Current Assignment Assignment C1 0 2 416.15 1 1 C2 2 0 316.25 2 2 C3 181 145 51.25 3 3 C4 221 181 34.25 3 3 C5 625 557 21.25 3 3 C6 821 990 76.25 3 3 N. Kumar, Asst. Professor of Marketing Algorithm II Differs from Algorithm I in how the initial seeds are modified As before first K observations are selected as the initial cluster seeds A seed that is a candidate for replacement is from one of the two seeds that are closest to each other An observation qualifies to replace one of the two candidates if the distance between the seeds is less than the distance between the observation and the closest seed N. Kumar, Asst. Professor of Marketing Algorithm II …contd. C1, C2 and C3 are the initial seeds The smallest distance between the seeds is between C1 and C2 Observation C4 does not qualify as a replacement as Distance(C1,C2) > Distance(C4 and the nearest seed C3) Observation C5 does qualify as a replacement as Distance(C1,C2) < Distance(C5 and the nearest seed C3): replace C2 with C5 N. Kumar, Asst. Professor of Marketing Initial Assignment C1 Distance Distance Distance Assigned from C1 from C2 from C3 to CL 0 181 625 1 C2 2 145 557 1 C3 181 0 136 2 C4 221 2 106 2 C5 625 136 0 3 C6 821 250 26 3 N. Kumar, Asst. Professor of Marketing New Cluster Centroids Variable CL1 CL2 CL3 Income 5.5 15.5 27.5 Education 5.5 14.5 19.5 N. Kumar, Asst. Professor of Marketing Distance Matrix Distance from CL1 Distance from CL2 Distance from CL3 Previous Current Assignment Assignment C1 0.5 200.5 716.5 1 1 C2 0.5 162.5 644.5 1 1 C3 162.5 0.5 186.5 2 2 C4 200.5 0.5 152.5 2 2 C5 590.5 120.5 6.5 3 3 C6 600.50 230.5 6.5 3 3 N. Kumar, Asst. Professor of Marketing Hierarchical vs. Non-Hierarchical Clustering Hierarchical clustering does not require a priori knowledge of the number of clusters Assignments are static Use hierarchical clustering for exploratory purposes Non-Hierarchical Methods can be viewed as a complementary rather than a competing method N. Kumar, Asst. Professor of Marketing Voter Profiling Survey of voters concerns may help us group customers with similar concerns – perhaps they all live in a certain area? Target ads/mailings with customized messages N. Kumar, Asst. Professor of Marketing Ad Campaign Use attitudinal data to segment customers Target message appropriately N. Kumar, Asst. Professor of Marketing Promotional Strategies Use transaction data to group customers into those that are more prone to purchasing the product on deal Give a stronger incentive to the price sensitive segment N. Kumar, Asst. Professor of Marketing