Download How SAS is Used for Research and Teaching to Enable Students to Become More Marketable

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Nonlinear dimensionality reduction wikipedia , lookup

Transcript
Data Modelling in SAS
How SAS is Used for Research and Teaching to
Enable Students to Become More Marketable
Iveta Stankovičová
Comenius University
Faculty of Management
Bratislava, Slovakia
[email protected]
Data
!
!
Current age is characteristic of
information explosion
Data are generated:
– For research purposes (historically, for data
analysis) – experimental data
– As operational data (today, in business) –
opportunistic data (Huber 1977)
2
Data
Experimental Opportunistic
Data
Data
Purpose
Reaserch
Operational
Value
Scientific
Commercial
Generation
Actively
controlled
Passively
observed
Size
Small
Massive
Hygiene
Clean
Dirty
State
Static
Dynamic
3
Data
!
!
Information
It is necessary to obtain information from
massive amounts of operational data for
decision making of managers (business
decision support)
It is necessary to explore and model
relationships in data
predictive modelling
(fundamental task)
! Data
Modelling = Data Mining
(cca 1963)
4
Data Mining - Definition
!
!
!
Selection process, research and modelling
based on great volume of data in order to
detect previous unknown information
patterns for advantage in the competiotion
environment
Use statistical methods and further methods
in borders on artificial intelligence
Multidisciplinary lineage
5
Data Mining – SAS definition
Advanced methods for exploring and
modelling relationships in large amounts of
data
Characteristics:
1. data – massive, operational, opportunistic
2. users and sponsors – non-researchers,
business oriented
3. methodology – multidisciplinary, via
computer
!
6
Data Mining – Analytical tools
!
!
!
!
!
!
Statistics
Artificial intelligence (AI)
Knowledge discovery in databases (KDD)
Machine learning
Pattern recognition methodology
Neurocomputing
7
Data Mining – Steps, Cycle
1. Identifying business
problem
2. Transforming data
into actionable
results
3. Acting according to
achieved results
4. Measuring the
results
1.
2.
4.
3.
8
Data Mining - Activities
!
!
!
!
!
!
Classification
Affinity grouping or association rules
Clustering, segmentation
Estimation
Prediction
Description and visualization
9
Data Mining - People
!
!
!
Domain experts
Data experts
Analytical experts
10
Data Mining - Processes
1. Model making
!
historical data:
1. training
2. test
3. validation
2. Apply model
!
new data
!
prediction
Data Mining
System
Algorithm
Training
Training Test
Eval
Model
Score Model
Prediction
Results
11
Data Mining – Practice
1.
2.
3.
4.
5.
6.
7.
Goal definition
Selection of data sources
Preparation of data for modelling
Selection and transformation of variables
Processing and evaluation of the model
Model verification
Implementation and model maintenance
12
Data Mining – SAS solution
SEMMA methodology:
1. Sample – identify input data sets, sample
from a large data set (training, test and
validation data sets)
2. Explore – explore data set statistically and
graphically
3. Modify – prepare the data for analysis
(data manipulation and transformation)
4. Model – fit a predictive model
5. Assess – compare competing models
13
Data Mining - Methods
!
!
!
Statistical methods - linear and logistic
regression, multidimensional methods,
time series analysis ...
Non-statistical methods - neural
networks, genetic algorithm ...
Mixed methods - classificacion and
regression trees ...
14
SAS System at Comenius
University Bratislava (CU)
!
!
November 1999 – signed a license
contract between CU Bratislava and SAS
Institute GmbH on providing 50 licences
of SAS System
November 2001 - addition to the
licence contract with Enterprise Guide
15
SAS System at Faculty of
Management Bratislava (FM)
!
!
!
Faculty of Management - 25 licenses
Beginning with SAS education (V 6.12) summer term in academic year
1999/2000
Current days – SAS V8.2 and Enterprise
Guide V2.0
16
Subjects of Statistics
3 compulsory subjects:
!
Introduction to Statistics
– (1st year, summer term – 4 hours/week)
!
Statistics on PC
– (2nd year, winter term – 2 hours/week)
!
Statistical Methods
– (2nd year, summer term - 4 hours/week)
1 elective subject:
!
Quantitative methods (in SAS System)
– (3rd year, summer term - 2 hours/week)
17
Subjects contents
Contents of compulsory subjects:
– mathematical statistics methods are
included into the basic modul (SAS/BASE,
SAS/STAT, SAS/ETS)
Contents of elective subject:
– logistic regression, principal components
analysis (PCA), cluster analysis, factor
analysis, discriminational analysis
(SAS/STAT, EG)
18
SAS Sytem – offered in Menu
!
Overview of modules an applications of SAS
System V8.2 for creation of statistical
analysis in the menu mode (knowledge of
SAS code is not required)
SAS/ASSIST software
SAS/INSIGHT software
SAS Analyst
SAS/Enterprise Guide
19
Activities
Outputs from SAS education:
! Projects – output from each subject
! Student Research Activity Competition – 3rd
year, cca 15 works/per year
! Thesis works
–
–
–
!
information system (modul AF)
data analysis (module BASE, STAT, QC, ...)
Scorecard (Enterprise Guide, Enterprise Miner)
Conference SAS Forum - participation of
teachers and students
20
Plans
Extension of plans for SAS exploitation in following
subjects:
! Multidimensional Methods of Analysis
! Time Series Analysis
! Marketing Research
! Data Mining
! Financial Analysis
! Quality Control
! Operational Management
21
Thanks for your attention!
22