Download Estimating Business Targets

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Nearest-neighbor chain algorithm wikipedia , lookup

Cluster analysis wikipedia , lookup

K-nearest neighbors algorithm wikipedia , lookup

Transcript
Estimating Business Targets
Advisor: Dr. Hsu
Graduate: Yung-Chu Lin
Data Source: Datta et al., KDD01, pp. 420-425.
2002/4/10
IDSL seminar
Abstract
Propose a new solution to the classical
econometric task of frontier analysis
Combine nearest neighbor methods and
classical statistical methods
Identify under marketed customers
Benchmark regional directory divisions
2002/4/10
IDSL seminar
Outline
Motivation
Objective
Historical approaches
Target estimation methodology
Case study
Conclusion
Personal opinion
2002/4/10
IDSL seminar
Motivation
Setting targets is a critical task
Setting the target of each entity to the
average amongst the entities traditionally
Two challenges
– The characteristics of the entities will have a
heavy influence on the outcome
– The inherent unsupervised nature of the
problem
2002/4/10
IDSL seminar
Objective
Provide a methodology for estimating
unsupervised maximal or minimal targets
Setting revenue target expectations for
individual customers
Revenue target setting for regional yellow
page directories
2002/4/10
IDSL seminar
Historical Approaches
Mathematical programming
Economics
2002/4/10
IDSL seminar
Mathematical Programming
i  g ( xi )
where i is the target for xi, a vector for
the ith observation
Sensitivity to errors or outliers since it
assumes that all observed targets define the
possible space
2002/4/10
IDSL seminar
Economics
i  g ( xi )   i
where  i is a non-negative error term
The requirement of a model for the error
term and for g
2002/4/10
IDSL seminar
Target Estimation Methodology
Nearest neighbor vs. clustering
The neighborhoods
The distance function
Target estimation from the neighborhoods
A heuristic for comparing neighborhoods
2002/4/10
IDSL seminar
Nearest Neighbor vs. Clustering
Time complexity
– Clustering is better than nearest neighbor
Problem of clustering
– Two similar entities fall into different cluster
– Dimension higher, influence more serious
– But nearest neighbor is not so
2002/4/10
IDSL seminar
The Neighborhoods
xi: ith observation
yi: the variable containg its target value
ni: neighborhood for xi, where ni is a set of
observations {xi, xj, …}
2002/4/10
IDSL seminar
The Distance Function
Continuous  standardize
e.g. Continuous- (2,1)(3,4)  (1* w )
2002/4/10
IDSL
seminar 0  w
Nominal- (a,b)(a,c)

1
2
2
2
 (3 * w2 ) 2
Target Estimation From the
Neighborhoods
Let yi(1), yi(2), …, yi(k) be the order
statistics, so that yi(1) is the largest
2002/4/10
IDSL seminar
A Heuristic for Comparing
Neighborhoods
 Maximal frontier  E(xi) will range from 0 to 1
 Minimal frontier  E(xi) >=1
2002/4/10
IDSL seminar
Case Study
Target revenues for directory book
advertisers
Target revenue for regional directories
2002/4/10
IDSL seminar
(1) Target Revenues for
Directory Book Advertisers
Goal
– Find businesses that have low spending
relative to those with otherwise similar
characteristics
Three categories of data available
– Advertiser: e.g. number of employees
– Directory: e.g. distribution size
– Market : e.g. median household income
2002/4/10
IDSL seminar
Calculating Nearest Neighbors
Standardize continuous data: natural log
K=4
Weight the variables equally
– But decrease the weights for many of the
directory and market variables
2002/4/10
IDSL seminar
Distribution for E(x) for
Advertisers
2002/4/10
IDSL seminar
A Decision Tree to Predict phi xi
2002/4/10
IDSL seminar
(2) Target Revenue for Regional
Directories
Goal
– Benchmark regional directory divisions
Separate the data into two sets
– Training set: 80%
– Test set: 20%
K=4
2002/4/10
IDSL seminar
Book Type
System book
– an entire serving area
System-neighborhood book
– A smaller number of geographic areas in the
franchise area
Neighborhood book
– Areas outside of the telephone company’s
franchise area
2002/4/10
IDSL seminar
Four Different Distributions
labeled according to the legend
2002/4/10
IDSL seminar
The x-axis shos log(distribution)
and the y-axis E(x)
Neigborhood books
2002/4/10
System books
IDSL seminar
Non-system books
Conclusion
 Present a general data mining methodology for
estimating business targets by frontier analysis
 First case
– Increase sales focus on the under-marketed customers
– Increase the potential revenue by several million
 Second case
– Estimate optimal revenue performance targets for
directory divisions
– Increase for directory books is a minimum of several
million dollars
2002/4/10
IDSL seminar
Personal opinion
Combine several existed methodologies or
disciplines can make new powerful one
2002/4/10
IDSL seminar