Download An overview of data mining with the SAS System

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
An Overview of
Data Mining with
The SAS System
Gerhard Held
SAS Institute
European Product Manager
Analysis Applications
Data Mining





The Flood of Data and Data Mining
What is Data Mining? - The DM
Process
Why Data Mining? - Applications
SAS Institute and Data Mining
Conclusion
S
The Flood of Data
and Data Mining
The Goal:
“Knowledge is the only competitive Advantage”
Jack Welch, CEO, General Electric
S
The Flood of Data
and Data Mining
The current Situation:
“Computers promised us a Fountain of Wisdom,
but delivered us a Flood of Data”
Gregory Piatetsky-Shapiro, 1991
S
The Flood of Data
and Data Mining
Data Warehousing Ideal for Data Mining
Operational
RDBMS
EIS
Data
Extractor
Transformation
Engine
OLAP
Risk
Metadata
Data Mining
Query and Reporting
Legacy
SAS
Information
Database
Internet Exploitation
Metadata Manager
Intelligent
Client/Server
External
Organisation
Product
Data Visualisation
OO RAD
Management
Customer
DSS
Loader
Scheduler
Quality
Exploitation
Market
Future
S
What is Data Mining?


Who are my best Customers
in Region X?
OLAP
What Sort of Customers
should I target?
Data Mining
S
What is Data Mining?


Unspecific Questions
“Advanced Methods of exploring and
modelling Relationships in large
Amounts of Data”
S
What is Data Mining?



Unspecific Questions
“Advanced Methods of exploring and
modelling Relationships in large
Amounts of Data”
DM is a Set of Techniques
S
What is Data Mining?


DM is a Technique, not an Application
DM consists of many Techniques:
Visual
Exploration
Clustering,
Factor,
Correspondence
Tree-based
Models
Time Series
Analysis
Neural
Networks
Statistical
Modelling
S
Data Mining




Unspecific Questions
“Advanced Methods of exploring and
modelling relationships in large
Amounts of Data”
DM is a Set of Techniques
DM is a Process
S
Data Mining as a Process
Business / IT Environment
DBMS
Data
Warehouse
Data Mining
Internal
Processing
Business
Reporting
and Graphics
Informed
Business
Decisions
S
Sample
Visual
Exploration
Variable
grouping,
subsetting
Neural
Networks
Tree-based
Models
Assess
Sampling ?
Explore
Manipulate
Model
Data Update ?
New Questions ?
Sample
Clustering,
Factor,
Correspond.
Adding or
subsetting
of Records
Statistical
Techniques
Assess
Time Series
Analysis
The Data Mining Process
Sampling?

A clear Pattern will also show in a
Sample
Enormous Performance Advantages

Random Sampling, Stratification, etc.

S
The Data Mining Process
Exploration



Visual Exploration: Multidimensional,
Graphical Data Analysis, Geographical
Visualisation
Statistical Visualisation: Cluster,
Factor, Correspondence, MDS,…
Groups, important Variables
S
The Data Mining Process
Manipulation



Variable Selection
New Variables: Groups, etc.
Significant Subgroups?
S
The Data Mining Process
Modelling





Neural Networks
Tree-based Models
Generalised Linear Models
Time Series Analysis
Specific Market Research Methods
S
The Data Mining Process
Assessment


“Survival of the Fittest”
Generate New Questions - Iterative
Process
S
Why Data Mining? - Industries

Retail:


Direct Mail:
Multimedia:
Banks:

Insurances:

Telecomms:

POS Data, Customer
Transactions
Customer Orders, Credit Data
Books, Video, Internet Usage
Stock Data, Account Data, Credit
Data
Claims Data, Benefit Payments
Data
Call Centres, Customer Data
S
Why Data Mining? - Applications
Marketing:




Customer Segmentation: The most
profitable Customers
Database Marketing: Which Prospects, which
Products?
Customer Attrition: Which Groups are in
Danger?
Media Analysis: Print Media, TV, Internet...
S
Why Data Mining? - Applications
Sales / Finance:




Sales Forecasting - Historical Data and
current influential Trends
Credit Approval: Credit Risk and Credit
Scoring
Fraud Detection: Insurances, Banks, Credit
Card Companies
Portfolio Analysis
S
Why Data Mining - Applications
Others:
 Forecasting of maximal CPU Usage
 Demand of Electricity, Water etc.
 Simulation of chemical or discrete
Manufacturing Processes
 ...
S
Data Mining - References
Our customers have been successful at Data
Mining because SAS Institute treats it as an
information process.
Neckermann, D
 Ellos, S
Credit Scoring
Customer Segmentation,
Campaign Management
 Postbank NV, NL
same
 UTAC, F
Vehicle Test Results
Tracking
 more: Inform 16 - Database Marketing

S
SAS Institute and Data Mining
DM Methods:



NNA - Initial Prod. July (UNIX)
Tree Menue System (Sample), thanks to SAS
UK!
Everything else Production Software:
 Exploration: INSIGHT, SPECTRAVIEW,
GIS
 Statistics
 Time Series Forecasting
 Market Research Methods
S
SAS Institute and Data Mining
More Information on Data Mining:






This Data Mining Stream
DM White Paper
Exhibition Area:
 Booth 2 Data Mining / Database Marketing
 Technology Centre
 Booth 5 Exploiting the Data Warehouse
Systems Stream Paper
Press Announcement
The Meta Group: MetaFax on Data Mining
S
Conclusion - Data Mining
SAS Software Benefits for Data Mining:



Access to all Data Sources (DBMS,
Data Warehouses, Others)
Complete Set of Technologies: NNs,
Trees, Visualisation, Statistics (Gartner
Group)
Unique Methodology: SEMMA
S
Conclusion - Data Mining
???
S
Conclusion - Data Mining
S
Conclusion - Data Mining
Sample
Explore
Manipulate
Model
Assess
S
Thank you for
your attention
The SAS® System for successful decision making
Related documents