Download CIT 365: Data Mining and Data Warehousing

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Database wikipedia , lookup

Big data wikipedia , lookup

Clusterpoint wikipedia , lookup

Functional Database Model wikipedia , lookup

Object-relational impedance mismatch wikipedia , lookup

Database model wikipedia , lookup

Transcript
CIT 858: Data Mining and
Data Warehousing
Course Instructor: Bajuna Salehe
Email: [email protected]
Web:
www.ifm.ac.tz/staff/bajuna/courses/
Introduction to Data
Mining and Data
Warehousing
Data Mining and Data
Warehousing

Agenda
 What
is Data Mining?
 What is Data Warehousing?
 The source of invention of Data Mining and
Data Warehousing.
 Drowning in Data Starving for Knowledge.
 Evolution of Database Technology to the
current state. (Home Work)
What Is Data Mining?
 Data
mining (knowledge discovery
from data)
Extraction
of interesting (non-trivial,
implicit, previously unknown and
potentially useful) patterns or knowledge
from huge amount of data
Data mining: a misnomer?
 Should
have been named “knowledge
mining from data” which is too long
 or “knowledge mining” not reflecting the
emphasis on mining from huge data
What Is Data Mining?
Many people treat data mining as a
synonym for another popularly used term
Knowledge Discovery from
Data/Databases (KDD).
 KDD as the process is depicted below:

The KDD Process
Knowledge
Evaluation &
Presentation
Data Mining
Selection &
Transformation
Data
Warehouse
Cleaning &
Integration
Databases
KDD Process
1) Data cleaning
 To
move noise and inconsistent data
2) Data integration
 Where
multiple data sources may be
combined
3) Data selection
 Where
data relevant to the analysis task are
retrieved from the database.
KDD Process
4) Data transformation
 Where
data are transformed or consolidated
into forms appropriate for mining by
performing summary or aggregation
operations, for instance.
5) Data mining
 An
essential process where intelligent
methods are applied in order to extract data
pattern.
KDD Process
6) Pattern evaluation.
 To
identify the truly interesting pattern
representing knowledge.
7) Knowledge presentation
 Where
visualization and knowledge
representation techniques are used to present
the mined knowledge to the users.
8) Use of discovered knowledge
Data Mining: On What Kinds Of
Data?
Relational database
Data warehouse
Transactional database
Advanced database and information
repository
 Spatial
and temporal data
 Stream data
 Multimedia database
 Text databases & WWW
Data Mining Functionalities
Association (correlation and causality)
 Cheese
& Bread
Classification and Prediction
 Construct
models that describe and
distinguish classes or concepts for future
prediction
 Predict some unknown or missing numerical
values
Data Mining Functionalities
(cont…)
Cluster analysis
 Class
label is unknown: Group data to form new
classes, e.g., cluster houses to find distribution
patterns
Outlier analysis
 Outlier:
a data object that does not comply with the
general behavior of the data
 Noise or exception? No! useful in fraud detection and
rare event analysis
Necessity Is The Mother Of Invention
Data explosion problem
 Automated
data collection tools and mature
database technology lead to huge amounts of
data accumulated
We are drowning in data, but starving for
knowledge!
Solution: Data warehousing and data mining
 Data
warehousing and on-line analytical
processing
 Mining interesting knowledge (rules,
regularities, patterns, constraints) from data
Evolution Of Database Technology
1960s:
 Data
collection, database creation, IMS and
network DBMS
1970s:
 Relational
data model, relational DBMS
implementation
1980s:
 RDBMS,
advanced data models (extendedrelational, OO, deductive, etc.)
Evolution Of Database Technology
1990s:
 Data
mining, data warehousing, multimedia
databases, and Web databases
2000s
 Stream
data management and mining
 Data mining with a variety of applications
 Web technology and global information
systems
Potential Applications
Data analysis and decision support
 Market
analysis and management
 Risk analysis and management
 Fraud detection and detection of unusual
patterns
Other applications
 Text
mining (email, documents) and Web
mining
 Stream data mining
Fraud Detection & Mining Unusual Patterns
Applications: Health care, retail, credit card
service, telecommunications
 Auto insurance: ring of collisions
 Money laundering: suspicious monetary transactions
 Medical insurance
 Professional patients, ring of doctors, and ring of references
 Unnecessary or correlated screening tests
 Telecommunications: phone-call fraud
 Phone call model: destination of the call, duration, time of
day or week. Analyze patterns that deviate from an
expected norm
 Retail industry
 Analysts estimate that 38% of retail shrink is due to
dishonest employees
 Anti-terrorism
Other Applications
Sports
 IBM Advanced
Scout analyzed NBA game statistics
(shots blocked, assists, and fouls) to gain competitive
advantage for New York Knicks and Miami Heat
Internet Web Surf-Aid
 IBM
Surf-Aid applies data mining algorithms to Web
access logs for market-related pages to discover
customer preference and behavior to help analyzing
effectiveness of Web marketing, improving Web site
organization, etc.
What is Data Warehouse?

Defined in many different ways, but not
rigorously
 A decision
support database that is maintained
separately from the organization’s operational
database
 Support information processing by providing a
solid platform of consolidated, historical data for
analysis
“A data warehouse is a subject-oriented,
integrated, time-variant, and non-volatile
collection of data in support of
management’s decision-making process”
—Bill Inmon
The source of Invention of DW and
Data Mining



Data explosion problem
 Automated data collection tools and mature
database technology lead to huge amounts of
data accumulated
We are drowning in data, but starving for
knowledge!
Solution: Data warehousing and data mining
 Data warehousing and on-line analytical
processing
 Mining interesting knowledge (rules, regularities,
patterns, constraints) from data in large
databases
Drowning In Data, Starving For
Knowledge
DATA
KNOWLEDGE
Importance of Data Mining
By performing data mining, interesting
knowledge, regularities, or high-level
information can be extracted from
databases and viewed or browsed from
different angles.
 The discovered knowledge can be applied
to decision making process.
