Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Intelligent Choices Preceeding Data Analysis Katharina Morik Univ. Dortmund, www-ai.cs.uni-dortmund.de • Knowledge Discovery in Databases (KDD) • The Mining Mart approach • Case studies – Item sales – Intensive care 1 The UCI Library Approach • Learning task: classification • Evaluation criteria: accuracy and coverage • Data sets – Small number of examples – Small number of features – All and only relevant features included – No noise 2 KDD Task • Learning task of the application needs to be transferred to a formal learning task (classification, regression, clustering) – „I want to predict sales 4 weeks ahead“ – „I want to know more about my best (worst) customers“ – „I want to detect fraud“ • Databases – Very large number of records – Very large number of features – Relevant fatures missing – Noise included 3 Observation Experienced users can apply any learning system successfully to any application, since they prepare the data well... • The representation LE of examples and the choice of a sample determines the applicability of learning methods. • A chain of data transformations (learning steps or manual preprocessing) leads to LE of the method that delivers the desired result. Experienced users remember prototypical successful transformation/learning chains 4 The Real Process data application: users performance system LE1 LHn+m LH1 LE2 LEn+m LH2 ... ... LEn-1 LHn-1 = LEn learning/data mining LHn+1 LHn = LEn+1 5 Intelligent Choices 80% of the KDD work is invested into: • Choosing the learning task • Sampling • Feature generation, extraction, and selection • Data cleaning • Model selection or tuning the hypothesis space • Defining appropriate evaluation criteria 6 The Mining Mart Approach Best practice cases of preprocessing chains exist... • Data, LE and LH are described on the meta level. • The meta-level description is presented in application terms. • MiningMart users choose a case and apply the corresponding transformation and learning chain to their application. ... and more can be obtained! 7 Call for Participation • MiningMart develops an operational meta-language for describing data and operators. • MiningMart prepares the first cases of KDD. • MiningMart will present the case-base in the WWW. • You may contribute to the endavour! – Apply the meta-language to your application and deliver it as a positive example to the case-base; or – apply a case of MiningMart to your data. 8 The Consortium • • • ? • • Katharina Morik Univ. Dortmund, D (Coordinator) Lorenza Saitta Univ. Piemonte del Avogadro, I Pieter Adriaans Perot Systems Netherland, NL Michael May GMD, D Jörg-Uwe Kietz SwissLife, CH Fabio Malabocchia TILab, I 9 The Mining Mart System Human Computer Interface KDD process tasks, problem models Case-base of successful KDD process Meta-data Raw-data Meta-data Applicability Manual Pre-processing Operators: Time multi-relation ML-Operators: Time Parameters Features Description Logic Meta-data Augmented data of results 10 The Meta Model for Meta Data The Relational Model describes the database The Execution Model generates SQL statements or calls to external tools The Conceptual Model describes the individuals and classes of the domain with their relations The Case Model describes chains of preprocessing operators 11 Use of the Meta Model • • • • The meta model is stored in a database. The database manager delivers the relational model. The data analyst delivers the conceptual model. The KDD expert delivers or adjusts a case model. First cases are delivered by the Mining Mart project. • The system compiles meta data into SQL statements and calls to external tools hence executing the case model on the data. 12 Sales of Items of a Drugstore 160 140 Insect killers 1 Insect killers 2 Sun milk Candles 1 Baby food 1 Beauty Sweets Self-tanning cream Candles 2 Baby food 2 120 100 80 60 20 Week 53|98 47|98 41|98 35|98 29|98 23|98 17|98 11|98 05|98 51|97 45|97 39|97 33|97 27|97 15|97 21|97 09|97 49|96 03|97 43|96 31|96 37|96 19|96 25|96 13|96 07|96 0 01|96 Sales 40 13 Learning Task 1: Predict Sales of an Item Given drug store sales data of 50 items in 20 shops over 104 weeks predict the sales of an item such that the prediction never underestimates the sale, the prediction overestimates less than the rule of thumb. Observation: 90% of the items are sold less than 10 times a week. Requirement: prediction horizon is more than 4 weeks ahead. 14 Shop Application -- Data Shop Dm1 Dm1 Dm1 Dm2 ... Dm20 Week 1 ... 104 1 ... 104 Item1 4 ... 9 3 ... 12 ... ... ... ... ... ... ... Item50 12 ... 16 19 ... 16 LE DB1: I: T1 A1 ... A 50; set of multivariate time series 15 Preprocessing • From shops to items: multivariate to univariate LE1´: i:t1 a1 ... tk ak For all shops for all items: Create view Univariate as Select shop, week, itemi Where shop=“dmj” From Source; • Multiple learning Dm1_Item1 ... Dm1_Item50 1 4 ... 104 9 1 12... 104 16 .... Dm20_Item50 1 14... 104 16 16 Method 1 for Task 1: Exponential Smoothing • Univariate time series as input ( LE1` ), • incremental method: current hypothesis h and new observation o yield next hypothesis by h := h + l o, where l is given by the user, • predicts sales of n-next week by last h. 17 Method 2 for Task 1: SVM in the Regression Mode • Multiple learning: for each shop and each item, the support vector machine learns a function which is then used for prediction. • Asymmetric loss: – underestimation is multiplied by 20, i.e. 3 sales too few predicted -- 60 loss – overestimation is counted as it is, i.e. 3 sales too much predicted -- 3 loss (Stefan Rüping 1999) 18 Further Preprocessing • Obtaining many vectors from one series by sliding windows LH5 i:t1 a1 ... tw aw move window of size w by m steps Dm1_Item1_1 Dm1_Item1_2 ... Dm1_Item1_100 ... ... 1 2 4... 5 4... 6 7 8 Dm20_Item50_100 100 12... 104 16 100 6... 104 9 19 Article 766933 (bag?) sales window horizon value to predict time 20 Comparison with Exponential Smoothing horizon SVM exp. smoothing 1 56.764 52.40 2 57.044 59.04 3 57.855 65.62 4 58.670 71.21 8 60.286 88.44 13 59.475 102.24 21 loss horizon 22 Learning Task 2: Learning Sequences Are there typical sequences that are valid for all items? – After an action for an item its sales decrease. – Each decrease of sales is followed by an increase. • Given a set of subsequent events find frequent sequences. 23 From Sales Data to Event Sequences Multivariate time series Univariate time series ? Subsequent events ? LHn-1 = LEn ? LHn frequent event sequences 24 From Series to Sequences • Given some time series detect events (states, intervals) An event is a triple (state, begin,finish). The state might be a label or a (mean) value. Typical labels are: increase, decrease, stable... 25 Unsupervised Methods • All contiguous observations within one level (range) form one event (Bauer). • All contiguous observations with more or less the same gradient form one event (Morik, Wessel). • Clusters of subsequences form events (Das). 26 Moving Gradient Determining the time intervals with user-given tolerance threshhold. Abstracting into classes of gradients: increase,peak,decrease, stable... • • • • • • 1 • • • • 4 5 7 10 12 Time 27 Sales of Item 182830 in Shop 55 28 Summarizing Sales by Tolerant Moving Gradient (Wessel, Morik 1999) 29 From Subsequent Events to Event Sequences Multivariate time series Univariate time series Moving gradient Subsequent events ? LHn-1 = LEn ? LHn frequent event sequences 30 Transformation into Facts LE4: stable(182830,1,33,0). decreasing(182830, 33,34,-6). stable(182830, 34, 39,0). increasing(182830, 39, 40,7). decreasing(182830, 40, 42,-5). stable(182830, 42,108,0). 31 Summarizing Item 646152 in Shop 55 by Intolerant Moving Gradient 32 Corresponding Facts increasing(646152,1,2,3). decreasing(646152,2,3,-11). increasingPeak(646152,3,4,22). ... stable(646152, 25,37,0). increasing(646152, 37, 38, 8). decreasing(646152, 38, 39, -7). stable(646152, 39,40, 0). increasing(646152, 40, 41,7). decreasing(646152, 41, 42,-8). increasing(646152, 42, 43,10). stable(646152, 43, 48,-1). small time intervals 33 Method 3 for Task 2: Inductive Logic Programming • Rules about sequences: p1(I, Tb, Te, A r), p2(I, Te, Te2, As) p3(I, Te2, Te3, A t) • Results for sequences of sales trends: increasing (Item, Tb, Te) decreasing(Item, Te, Te2) increasing (Item, Tb, Te), decreasing(Item, Te, Te2) stable(Item, Te2, Te3) 34 Same Data -- Several Cases • Predict sales of a particular item in a particular shop multivariate to univariate, multiple exponential smoothing OR multivariate to univariate, sliding windows, multiple learning with regression SVM • Find relations between trends that are valid for all sales in all shops multivariate to univariate, summarizing, transformation into facts, rule learning 35 Applications in Intensive Care • • • • • On-line monitoring of intensive care patients high-dimensional data about patient and medication measured every minute stored in the Emtec database of patient records --learning when to intervene in which way. 36 Patient G.C., male, 60 years old Hemihepatektomie right 37 The Data LE DB2 i 1: t 1 a 1 1 ... a 1 k i1: t 2 a 2 1 ... a 2 k ... i2: t 1 a 1 1 ... a 1 k ... set of rows for each patient: 1 row for each minute 38 Preprocessing • Chaining database rows i 1: t 1 a 1 1 ... a 1 k, t 2 a 2 1 ... a 2 k , ... • Multivariate to univariate i 1: t 1 a 1, t 2 a 1 ... t m a 1 i 1: t 1 a 2, t 2 a 2 ... t m a 2 ... • Detecting level changes and outliers 39 Phase State Analysis Time series y1,...,yN Phase state yt (yt ,yt 1 ) yt+1 yt Deterministic Process yt time t yt+1 yt AR(1)-process with outlier (AO) HRt timet yt+1 yt Heart rate time t U.Gather, M. Bauer yt 40 Level Change Detection level_change(pat4999, 50, 112, hr, up) level_change(pat4999, 112, 164, hr, down) level_change(pat4999, 10, 74, art, constant) level_change(pat4999, 74, 110, art, down) Computed Feature Comparing norm values for a vital sign and its mean in a time interval (± standard deviation): deviation(pat4999, 10, 74, art, up) 41 Learning Task 3: Recommend Interventions for Patients Are there valid rules for all multivariate time series, such that therapeutical interventions follow from a patient’s state? 42 Method 3: Inductive Logic Programming Given patient records in the form of facts: • deviations -- time intervals • therapeutical interventions -- time points • types of vital signs (group1: hr, swi, co; group2: art, vr) Learn rules about interventions: group1(V), deviation(P, T1, T2, V, Dir) noradrenaline(P, T2, Dir) 43 The Chain of Preprocessing Steps LE DB2 : i 1: t 1 a 1 1 ... a 1 k i1: t 2 a 2 1 ... a 2 k ... i2: t 1 a 1 1 ... a 1 k chaining db rows i 1: t 1 a 1 1 ... a k1 t 2 a 1 2 ... a k 2 ... i2: t 1 a 1 1 ... a k 1t 2 a 12 ... a k 2 ... multi- to univariate i 1: t 1 a 1 1 t 2 a 1 2 i 1: t 1 a 2 1 t 2 a 2 2 ... relational learning p 1(I,T i,T j,A,D), p 2 (I,Tj,Tk,A,D) p 3(I,Tk, Dir) level changes (i 1,t i ,t j,A) ... computed feature (i 1,t i ,t j,A,D) 44 Learning Task 4: Predict Next Minute‘s Intervention Given a patient’s state at time ti, learn whether and how to intervene at t i+1 Preprocessing: • Selection of time points where an intervention was done • Multiple to binary class for each drug, form the concepts drug_up, drug_down • Multiple learning for each binary class resulting in classifiers for each drug and direction of dose change (SVM_light) 45 The Chain of Preprocessing Steps LE DB2 : i 1: t 1 a 1 1 ... a 1 k i1: t 2 a 2 1 ... a 2 k ... i2: t 1 a 1 1 ... a 1 k Select time points with interventions i 1: t i a 1 i ... a ki i2: t j a 1j ... a kj ... Form binary classes a1_up +: a 2 ...a k ... a1_up-: .... a 2 ...a k a6_down+: a 2 ...a k.. a6_down-: a 2 ...a k.... Learning classifiers using SVM_light a1_up +: w2 a 2 ... wk a k ... a6_down +: w2 a 2 ... wk a k 46 Same Data -- Several Cases • Find time relations that express therapy protocols chaining db rows, multivariate to univariate, level changes, deviations, RDT • Predict intervention for a particular drug select time points, multiple to binary class, SVM_light 47 Behind the Boxes Db schema indicating time attribute(s), granularity,... Select statement in abstract form, instantiated by db schema Creating views in abstract form, instantiated by db schema and learning task Syntactic transformation for SVM Multiple learning control Calling SVM_light and writing results 48 Functionality of MD-Compiler Manual preprocessing operators of M4 are very elementary. Results of operators are mostly views. Base Tables Table_x Table_y RowSelection Views, created by the MD-Compiler RowSelection V_01 V_02 Table_z RowSelection V_03 MultiColumnFeatureConstruction V_01 vc_1 vc_2 vc_3 FeatureSelection V_04 RowSelection V_05 49 Several view definitions Inline View versus Physical View-Object CREATE VIEW V_01 (attrib_a, attrib_b, attrib_c) AS SELECT x_id, x_a, x_b FROM Table_x; Inline-View stored as sql-string in M4-Relational Information from M4-Relational (BaseAttribute) Inline-View Physical View-Object created by MD-Compiler for reading data and executing statistics 50 Several view definitions Inline View versus Physical View-Object Base Tables Table_x Table_y RowSelection Views, created by the MD-Compiler RowSelection V_01 V_02 Table_z RowSelection V_03 MultiColumnFeatureConstruction V_01 vc_1 vc_2 vc_3 FeatureSelection V_04 RowSelection V_05 Create View V_05 (...) as select ... from (select ... from (select ... from Table_x) instead of Create View V_05 (...) as select ... from V_04 51 Several view definitions Materialized view: Created by the MD-Compiler automatically in the background + performance gain when selecting data from V_04 or V_07 + all operation-outputs can be realized as views - additional storage needed Table_x Table_y Table_z V_01 V_02 V_03 Materialized_View_1 V_04 V_05 V_06 V_07 52 System-Architecture Statistics PL/Sql T1 T2 T3 T4 T5 M4Relation Editor T6 M4-Relational Model MD Compiler Java-Code M4-Conceptual Model Time Operators M4Concept Editor M4-Case Editor Java-Code from UniDo M4-Case Model Mining Mart Database 53 Summary of Cases Involving Time Db schema indicating time attribute(s) Syntactic Sliding windows transformations Summarizing windows L E1 Level changes ... LE4 Inductive logic programming (RDT) SVM_light for classification SVM for regression Exponential smoothing 54 Summary • • • • Preprocessing is the key issue in data analysis! Goal: Support users in making intelligent choices Approach: Cases of best practice View of a computer scientist: – Scalability to very large databases – Meta-data driven processing • Case studies on analysing data involving time 55 MiningMart Approach • Manager -- end-user knows about the business case • Database manager knows about the data • Case designer -- power-user expert in KDD • Developer supplies (learning) operators 56