Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
A Microeconomic View of Data Mining Author: Jon et al. Advisor: Dr. Hsu Graduate: ZenJohn Huang IDSL seminar 2001/12/4 Outline Motivation Objective Three examples Market segmentation Data mining as sensitivity analysis Segmentation in a model of competition Conclusions Personal opinion Motivation Data mining is about extracting interesting patterns from raw data, but only disjointed discussion of what “interesting” means. Patterns are often deemed “interesting” on the basis of their confidence and support. Objective Presenting a rigorous framework Based on optimization For evaluating data mining operations Utility in decision-making Studying certain aspects of data mining Economically motivated optimization problems With a large volume of unaggregated data Introduction (1/6) Microeconomic framework Optimization problem max xD f ( x) D is the domain of all possible decisions f(x) is the utility or value of decision x Introduction (2/6) Mathematical programming and microeconomics Lagrange multipliers and penalty functions[Avriel, 1976] This paper Feasible region D is basically endogenous Objective function f(x) Introduction (3/6) f ( x) ic fi ( x) C is a set of agents or other factors influencing the utility of the enterprise •Concrete level •Abstract level Introduction (4/6) max xD g ( x, y ic i ) Yi denote the data we have on customer i g(x, yi) is some fixed function of the decision and the data Introduction (5/6) max xD g ( x, ŷ) ic Aggregation •The computational requirements otherwise would be enormous •It is difficult to obtain the data yi Introduction (6/6) Fundamental issues Optimization Linear programming Game theory Three Examples (1/3) Beer and diapers Retailer stocks two products in quantities x1, x2; X1+x2 <= c The profit margins in the two products are m1, m2 Part Y1 ic y1,i Y2 ic yi ,2 All-or nothing B y B y B y y 1 1.i 2 i .2 3 1,i i,2 Three Examples (2/3) Market segmentation Residence Business customers f i ( x) max ci x1 , ci x2 max ( x1, x 2 )D 2 max c x , c x ic i 1 i 2 Three Examples (3/3) Beer and diapers, revisited Transaction(location, dd, mm, yy, item1, item2, …, itemn) Transaction[location=‘Palo alto’] Transaction[location=‘Palo alto’ and 12<tt] Transaction[location=‘Palo alto’ and dayof-the-week(dd,mm,yy)=‘Monday’] Market Segmentation To segment customers into k clusters Different marketing strategy Different advertising campaign k j 1 c max xD iC j max c x [ max c x ic ic i x i j : j 1,..., k i j : j 1,..., k ] k Specific Problems N vectors in c1,…,cn {-1, 1}d K is an integer Find a set of k vectors x1,…,xk Maximize the sum n i 1 max j 1 xi c j {-1, 1}d Complexity 1. The segmentation problems corresponding to the following feasible sets D is NP-complete 2. Segmentation problems in the previous theorem can be solved in linear time when the number of dimensions Complexity(cont’d) Theorem 1. 2. 3. 4. 5. The d-dimensional unit ball, even with k=2 The d-dimensional unit L1 ball The r-slice of the d-dimensional hypercube The d-dimensional hypercube, even with k=2 The set of all spanning trees of a graph G, even with k=2 Complexity(cont’d) Sketch 1. 2. 3. 4. Can be solved by aligning the solution with the cost vector Has only 2d vertices Can be solved by choosing the r most popular elements By simply picking the vertex that coordinatewise agrees in sign with the cost vector Data Mining As Sensitivity Analysis(1/3) max Ax b , x 0 c x Data Mining As Sensitivity Analysis(2/3) 2 fi 1 r f i ( y ,..., y ) satisfies 0 , fi is linear k l y y fi f i ( y ,..., y ) satisfies 0 , f i is nonlinear k l y y 2 1 r Yi is the table capture from ci Data Mining As Sensitivity Analysis(3/3) 1 max X ij I j D j i , X ij 0 c Ci j Segmentation in a Model of Competition Two-player games Probability distribution Conclusions Presenting a rigorous framework for the automatic evaluation of data mining operations Data mining as an activity by a revenue-maximizing enterprise Personal Opinion Using independent decisions to K mean