Download A Microeconomic View of Data Mining

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Nonlinear dimensionality reduction wikipedia , lookup

Transcript
A Microeconomic View of
Data Mining
Author: Jon et al.
Advisor: Dr. Hsu
Graduate: ZenJohn Huang
IDSL seminar 2001/12/4
Outline
Motivation
Objective
Three examples
Market segmentation
Data mining as sensitivity analysis
Segmentation in a model of competition
Conclusions
Personal opinion
Motivation
Data mining is about extracting
interesting patterns from raw data, but
only disjointed discussion of what
“interesting” means.
Patterns are often deemed “interesting”
on the basis of their confidence and
support.
Objective
Presenting a rigorous framework
Based on optimization
For evaluating data mining operations
Utility in decision-making
Studying certain aspects of data mining
Economically motivated optimization problems
With a large volume of unaggregated data
Introduction (1/6)
Microeconomic framework
Optimization problem
max
xD
f ( x)
D is the domain of all possible decisions
f(x) is the utility or value of decision x
Introduction (2/6)
Mathematical programming and
microeconomics
Lagrange multipliers and penalty
functions[Avriel, 1976]
This paper
Feasible region D is basically endogenous
Objective function f(x)
Introduction (3/6)
f ( x) 

ic
fi ( x)
C is a set of agents or other factors influencing the utility of the
enterprise
•Concrete level
•Abstract level
Introduction (4/6)
max
xD
 g ( x, y
ic
i
)
Yi denote the data we have on customer i
g(x, yi) is some fixed function of the decision and the data
Introduction (5/6)
max
xD
 g ( x, ŷ)
ic
Aggregation
•The computational requirements otherwise would be enormous
•It is difficult to obtain the data yi
Introduction (6/6)
Fundamental issues
Optimization
Linear programming
Game theory
Three Examples (1/3)
Beer and diapers
Retailer stocks two products in quantities
x1, x2; X1+x2 <= c
The profit margins in the two products are
m1, m2
Part Y1  ic y1,i Y2  ic yi ,2
All-or nothing B  y  B  y  B  y  y
1
1.i
2
i .2
3
1,i
i,2
Three Examples (2/3)
Market segmentation
Residence
Business customers
f i ( x)  max ci  x1 , ci  x2 
max
( x1, x 2 )D 2
 max c  x , c  x 
ic
i
1
i
2
Three Examples (3/3)
Beer and diapers, revisited
Transaction(location, dd, mm, yy, item1,
item2, …, itemn)
Transaction[location=‘Palo alto’]
Transaction[location=‘Palo alto’ and 12<tt]
Transaction[location=‘Palo alto’ and dayof-the-week(dd,mm,yy)=‘Monday’]
Market Segmentation
To segment customers into k clusters
Different marketing strategy
Different advertising campaign
k

j 1
c
max
xD
iC j
 max c  x
[ max c  x
ic
ic
i
x

i
j
: j  1,..., k 

i
j
: j  1,..., k ]    k

Specific Problems
N vectors in c1,…,cn
{-1, 1}d
K is an integer
Find a set of k vectors x1,…,xk
Maximize the sum
n

i 1
max
j 1
xi  c j
 {-1, 1}d
Complexity
1. The segmentation problems
corresponding to the following feasible
sets D is NP-complete
2. Segmentation problems in the
previous theorem can be solved in
linear time when the number of
dimensions
Complexity(cont’d)
Theorem
1.
2.
3.
4.
5.
The d-dimensional unit ball, even with k=2
The d-dimensional unit L1 ball
The r-slice of the d-dimensional hypercube
The d-dimensional hypercube, even with k=2
The set of all spanning trees of a graph G,
even with k=2
Complexity(cont’d)
Sketch
1.
2.
3.
4.
Can be solved by aligning the solution with
the cost vector
Has only 2d vertices
Can be solved by choosing the r most
popular elements
By simply picking the vertex that coordinatewise agrees in sign with the cost vector
Data Mining As Sensitivity
Analysis(1/3)
max
Ax b , x 0
c x
Data Mining As Sensitivity
Analysis(2/3)
2

fi
1
r
f i ( y ,..., y ) satisfies
 0 , fi is linear
k
l
y y
 fi
f i ( y ,..., y ) satisfies
 0 , f i is nonlinear
k
l
y y
2
1
r
Yi is the table capture from ci
Data Mining As Sensitivity
Analysis(3/3)
 1 max X ij 

I j  D j    i , X ij 0
c

Ci
 j

Segmentation in a Model of Competition
Two-player games
Probability distribution
Conclusions
Presenting a rigorous framework for the
automatic evaluation of data mining
operations
Data mining as an activity by a
revenue-maximizing enterprise
Personal Opinion
Using independent decisions to K mean