Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Data Modelling in SAS How SAS is Used for Research and Teaching to Enable Students to Become More Marketable Iveta Stankovičová Comenius University Faculty of Management Bratislava, Slovakia [email protected] Data ! ! Current age is characteristic of information explosion Data are generated: – For research purposes (historically, for data analysis) – experimental data – As operational data (today, in business) – opportunistic data (Huber 1977) 2 Data Experimental Opportunistic Data Data Purpose Reaserch Operational Value Scientific Commercial Generation Actively controlled Passively observed Size Small Massive Hygiene Clean Dirty State Static Dynamic 3 Data ! ! Information It is necessary to obtain information from massive amounts of operational data for decision making of managers (business decision support) It is necessary to explore and model relationships in data predictive modelling (fundamental task) ! Data Modelling = Data Mining (cca 1963) 4 Data Mining - Definition ! ! ! Selection process, research and modelling based on great volume of data in order to detect previous unknown information patterns for advantage in the competiotion environment Use statistical methods and further methods in borders on artificial intelligence Multidisciplinary lineage 5 Data Mining – SAS definition Advanced methods for exploring and modelling relationships in large amounts of data Characteristics: 1. data – massive, operational, opportunistic 2. users and sponsors – non-researchers, business oriented 3. methodology – multidisciplinary, via computer ! 6 Data Mining – Analytical tools ! ! ! ! ! ! Statistics Artificial intelligence (AI) Knowledge discovery in databases (KDD) Machine learning Pattern recognition methodology Neurocomputing 7 Data Mining – Steps, Cycle 1. Identifying business problem 2. Transforming data into actionable results 3. Acting according to achieved results 4. Measuring the results 1. 2. 4. 3. 8 Data Mining - Activities ! ! ! ! ! ! Classification Affinity grouping or association rules Clustering, segmentation Estimation Prediction Description and visualization 9 Data Mining - People ! ! ! Domain experts Data experts Analytical experts 10 Data Mining - Processes 1. Model making ! historical data: 1. training 2. test 3. validation 2. Apply model ! new data ! prediction Data Mining System Algorithm Training Training Test Eval Model Score Model Prediction Results 11 Data Mining – Practice 1. 2. 3. 4. 5. 6. 7. Goal definition Selection of data sources Preparation of data for modelling Selection and transformation of variables Processing and evaluation of the model Model verification Implementation and model maintenance 12 Data Mining – SAS solution SEMMA methodology: 1. Sample – identify input data sets, sample from a large data set (training, test and validation data sets) 2. Explore – explore data set statistically and graphically 3. Modify – prepare the data for analysis (data manipulation and transformation) 4. Model – fit a predictive model 5. Assess – compare competing models 13 Data Mining - Methods ! ! ! Statistical methods - linear and logistic regression, multidimensional methods, time series analysis ... Non-statistical methods - neural networks, genetic algorithm ... Mixed methods - classificacion and regression trees ... 14 SAS System at Comenius University Bratislava (CU) ! ! November 1999 – signed a license contract between CU Bratislava and SAS Institute GmbH on providing 50 licences of SAS System November 2001 - addition to the licence contract with Enterprise Guide 15 SAS System at Faculty of Management Bratislava (FM) ! ! ! Faculty of Management - 25 licenses Beginning with SAS education (V 6.12) summer term in academic year 1999/2000 Current days – SAS V8.2 and Enterprise Guide V2.0 16 Subjects of Statistics 3 compulsory subjects: ! Introduction to Statistics – (1st year, summer term – 4 hours/week) ! Statistics on PC – (2nd year, winter term – 2 hours/week) ! Statistical Methods – (2nd year, summer term - 4 hours/week) 1 elective subject: ! Quantitative methods (in SAS System) – (3rd year, summer term - 2 hours/week) 17 Subjects contents Contents of compulsory subjects: – mathematical statistics methods are included into the basic modul (SAS/BASE, SAS/STAT, SAS/ETS) Contents of elective subject: – logistic regression, principal components analysis (PCA), cluster analysis, factor analysis, discriminational analysis (SAS/STAT, EG) 18 SAS Sytem – offered in Menu ! Overview of modules an applications of SAS System V8.2 for creation of statistical analysis in the menu mode (knowledge of SAS code is not required) SAS/ASSIST software SAS/INSIGHT software SAS Analyst SAS/Enterprise Guide 19 Activities Outputs from SAS education: ! Projects – output from each subject ! Student Research Activity Competition – 3rd year, cca 15 works/per year ! Thesis works – – – ! information system (modul AF) data analysis (module BASE, STAT, QC, ...) Scorecard (Enterprise Guide, Enterprise Miner) Conference SAS Forum - participation of teachers and students 20 Plans Extension of plans for SAS exploitation in following subjects: ! Multidimensional Methods of Analysis ! Time Series Analysis ! Marketing Research ! Data Mining ! Financial Analysis ! Quality Control ! Operational Management 21 Thanks for your attention! 22