Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Estimating Business Targets Advisor: Dr. Hsu Graduate: Yung-Chu Lin Data Source: Datta et al., KDD01, pp. 420-425. 2002/4/10 IDSL seminar Abstract Propose a new solution to the classical econometric task of frontier analysis Combine nearest neighbor methods and classical statistical methods Identify under marketed customers Benchmark regional directory divisions 2002/4/10 IDSL seminar Outline Motivation Objective Historical approaches Target estimation methodology Case study Conclusion Personal opinion 2002/4/10 IDSL seminar Motivation Setting targets is a critical task Setting the target of each entity to the average amongst the entities traditionally Two challenges – The characteristics of the entities will have a heavy influence on the outcome – The inherent unsupervised nature of the problem 2002/4/10 IDSL seminar Objective Provide a methodology for estimating unsupervised maximal or minimal targets Setting revenue target expectations for individual customers Revenue target setting for regional yellow page directories 2002/4/10 IDSL seminar Historical Approaches Mathematical programming Economics 2002/4/10 IDSL seminar Mathematical Programming i g ( xi ) where i is the target for xi, a vector for the ith observation Sensitivity to errors or outliers since it assumes that all observed targets define the possible space 2002/4/10 IDSL seminar Economics i g ( xi ) i where i is a non-negative error term The requirement of a model for the error term and for g 2002/4/10 IDSL seminar Target Estimation Methodology Nearest neighbor vs. clustering The neighborhoods The distance function Target estimation from the neighborhoods A heuristic for comparing neighborhoods 2002/4/10 IDSL seminar Nearest Neighbor vs. Clustering Time complexity – Clustering is better than nearest neighbor Problem of clustering – Two similar entities fall into different cluster – Dimension higher, influence more serious – But nearest neighbor is not so 2002/4/10 IDSL seminar The Neighborhoods xi: ith observation yi: the variable containg its target value ni: neighborhood for xi, where ni is a set of observations {xi, xj, …} 2002/4/10 IDSL seminar The Distance Function Continuous standardize e.g. Continuous- (2,1)(3,4) (1* w ) 2002/4/10 IDSL seminar 0 w Nominal- (a,b)(a,c) 1 2 2 2 (3 * w2 ) 2 Target Estimation From the Neighborhoods Let yi(1), yi(2), …, yi(k) be the order statistics, so that yi(1) is the largest 2002/4/10 IDSL seminar A Heuristic for Comparing Neighborhoods Maximal frontier E(xi) will range from 0 to 1 Minimal frontier E(xi) >=1 2002/4/10 IDSL seminar Case Study Target revenues for directory book advertisers Target revenue for regional directories 2002/4/10 IDSL seminar (1) Target Revenues for Directory Book Advertisers Goal – Find businesses that have low spending relative to those with otherwise similar characteristics Three categories of data available – Advertiser: e.g. number of employees – Directory: e.g. distribution size – Market : e.g. median household income 2002/4/10 IDSL seminar Calculating Nearest Neighbors Standardize continuous data: natural log K=4 Weight the variables equally – But decrease the weights for many of the directory and market variables 2002/4/10 IDSL seminar Distribution for E(x) for Advertisers 2002/4/10 IDSL seminar A Decision Tree to Predict phi xi 2002/4/10 IDSL seminar (2) Target Revenue for Regional Directories Goal – Benchmark regional directory divisions Separate the data into two sets – Training set: 80% – Test set: 20% K=4 2002/4/10 IDSL seminar Book Type System book – an entire serving area System-neighborhood book – A smaller number of geographic areas in the franchise area Neighborhood book – Areas outside of the telephone company’s franchise area 2002/4/10 IDSL seminar Four Different Distributions labeled according to the legend 2002/4/10 IDSL seminar The x-axis shos log(distribution) and the y-axis E(x) Neigborhood books 2002/4/10 System books IDSL seminar Non-system books Conclusion Present a general data mining methodology for estimating business targets by frontier analysis First case – Increase sales focus on the under-marketed customers – Increase the potential revenue by several million Second case – Estimate optimal revenue performance targets for directory divisions – Increase for directory books is a minimum of several million dollars 2002/4/10 IDSL seminar Personal opinion Combine several existed methodologies or disciplines can make new powerful one 2002/4/10 IDSL seminar