Download KSU CIS 830: Advanced Topics in Artificial Intelligence

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Nonlinear dimensionality reduction wikipedia , lookup

Transcript
From Data Mining
To Knowledge Discovery :
An Overview
Course: CIS 864
Class:
6
Presenter: MANMOHAN K. UTTARWAR
On:
Monday, January 29, 2001
CIS 830: Advanced Topics in Artificial Intelligence
Kansas State University
Department of Computing and Information Sciences
Contents
Reasoning
Advances
KDD:Definition
Data Mining & KDD
The KDD Process
Stages Of KDD
Primary Tasks of Data Mining
Components Of Data Mining Algorithm
Popular Data Mining methods
Application Issues Of Data Mining
Guidelines for Selecting Potential KDD Applications
Research & Application Challenges for KDD
CIS 830: Advanced Topics in Artificial Intelligence
Kansas State University
Department of Computing and Information Sciences
An Overview
Explosive growth of Business,Govt. & Scientific databases.
Ability to Interpret & Digest Data
Inappropriate Tools & Techniques for Database Analysis
CIS 830: Advanced Topics in Artificial Intelligence
Kansas State University
Department of Computing and Information Sciences
Advances in Storage Technology
Wal-Mart : 20 Million Trans/Day
Health care : Multi Gigabyte
Mobil Oil : 100 TB of Oil Exploration Data
NASA: EOS –generates 50GB /hr Remotely Sensed Image
Data
CIS 830: Advanced Topics in Artificial Intelligence
Kansas State University
Department of Computing and Information Sciences
KDD: Definition
The non-trivial process of identifying valid, novel,
potentially useful, and ultimately understandable
patterns in data
Multiple process
non-trivial process
Justified patterns/models
valid
novel
useful
understandable
Previously unknown
Can be used
by human and machine
CIS 830: Advanced Topics in Artificial Intelligence
Kansas State University
Department of Computing and Information Sciences
Data Mining & KDD
Data Mining
- Step in KDD process
- Consists of particular
Data Mining algorithms
• Under specified
computational
efficiency limitations
• produces specific
enumeration of
patterns
CIS 830: Advanced Topics in Artificial Intelligence
KDD process is the
process of using Data
Mining methods
(Algorithms) to extract
deemed knowledge
according to the
specifications of
measures and threshold
using the database along
with any required
preprocessing, sub
sampling &
transformations of that
database.
Kansas State University
Department of Computing and Information Sciences
The KDD Processes
Preliminaries
Developing an understanding of the application Domain
Creating a Target Data Set
Choosing the Data Mining Tasks
Preprocessing
Data cleaning and Pre-processing
Data Reduction and Projection
Choosing the Data Mining Algorithms
Data Mining
Application
Interpretation of Mining Patterns
Consolidating discovered Knowledge
CIS 830: Advanced Topics in Artificial Intelligence
Kansas State University
Department of Computing and Information Sciences
Stages of KDD
CIS 830: Advanced Topics in Artificial Intelligence
Kansas State University
Department of Computing and Information Sciences
Tasks in KDD Process
CIS 830: Advanced Topics in Artificial Intelligence
Kansas State University
Department of Computing and Information Sciences
Primary Tasks of Data Mining
finding the description
of several predefined
classes and classify
a data item into one
of them.
Classification
?
maps a data item
to a real-valued
prediction variable.
identifying a finite
set of categories or
clusters to describe
the data.
Clustering
finding a model
which describes
significant dependencies
between variables.
Regression
discovering the
most significant
changes in the data
Deviation and
change detection
CIS 830: Advanced Topics in Artificial Intelligence
Dependency
Modeling
finding a
compact description
for a subset of data
Summarization
Kansas State University
Department of Computing and Information Sciences
Primary Tasks of Data Mining
Classification
Regression
Clustering
Dependency modeling
Change & Deviation
Detection
Summarization
CIS 830: Advanced Topics in Artificial Intelligence
Kansas State University
Department of Computing and Information Sciences
Primary Tasks of Data Mining
CIS 830: Advanced Topics in Artificial Intelligence
Kansas State University
Department of Computing and Information Sciences
Components of Data Mining Algorithms
Model Representation
- L describing discoverable patterns
Model Evaluation
- meets criteria of KDD process
Search Methods
- Parameter Search
- Model Search
CIS 830: Advanced Topics in Artificial Intelligence
Kansas State University
Department of Computing and Information Sciences
Popular Data Mining Methods
Decision Trees & Rules
Non-Linear Regression & Classification Methods
Example-Based Methods
Probabilistic Graphical Dependency Model
Relational Learning Model
CIS 830: Advanced Topics in Artificial Intelligence
Kansas State University
Department of Computing and Information Sciences
Non-Linear & Nearest Neighbor Classifier
CIS 830: Advanced Topics in Artificial Intelligence
Kansas State University
Department of Computing and Information Sciences
SEMMA Process (Simple DM Algo.)
CIS 830: Advanced Topics in Artificial Intelligence
Kansas State University
Department of Computing and Information Sciences
Application Issues of KDD
Database Marketing
Analysis and Selection of Stocks
Scientific applications such as
-- Astronomy
-- Molecular Biology
-- Global Climate Change Modeling
Fraud Detection & Prevention
CIS 830: Advanced Topics in Artificial Intelligence
Kansas State University
Department of Computing and Information Sciences
Guidelines for Selecting a Potential KDD Application
Practical Criteria
-- Potential for significant impact of an application
-- No Good alternatives exists
-- Organizational support
-- Potential for privacy / legal issues
Technical Criteria
-- Availability of Significant Data
-- Relevance of attributes
-- Low noise levels
-- Confidence intervals
-- Prior Knowledge
CIS 830: Advanced Topics in Artificial Intelligence
Kansas State University
Department of Computing and Information Sciences
Challenges for KDD
Larger Database
High Dimensionality
Over-fitting
Assessing Statistical Significance
Changing Data & Knowledge
Missing & Noisy Data
Complex Relationships between Fields
Understandability of Patterns
User Interaction & Prior Knowledge
Integration with other Systems
CIS 830: Advanced Topics in Artificial Intelligence
Kansas State University
Department of Computing and Information Sciences
Conclusions
KDD – a desirable end product
Many approaches exists
- have advantages & problems
Barrier in obtaining Quality Data
CIS 830: Advanced Topics in Artificial Intelligence
Kansas State University
Department of Computing and Information Sciences