Download Slide 27-4

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Copyright © 2004 Pearson Education, Inc.
Chapter 27
Data Mining Concepts
Copyright © 2004 Pearson Education, Inc.
Overview of Data Mining
Technology
Data Mining aka Knowledge Discovery in
Databases (KDD)
– Discovery of new information in terms of
patterns or rules from vast amounts of data
Must be carried out efficiently on large files
and databases
Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition
Copyright © 2004 Pearson Education, Inc.
Slide 27-3
Goals of Data Mining
 Prediction
– Show how certain attributes will behave in future
 Identification
– Identify existance of an item
 Classification
– Partition data into different categories
 Optimization
– Limited resources such as time, space, money
Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition
Copyright © 2004 Pearson Education, Inc.
Slide 27-4
FIGURE 27.1
Example transactions in market-basket model.
Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition
Copyright © 2004 Pearson Education, Inc.
Slide 27-5
FIGURE 27.2
FP-tree and item header table.
Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition
Copyright © 2004 Pearson Education, Inc.
Slide 27-6
FIGURE 27.3
Taxonomy of items in a supermarket.
Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition
Copyright © 2004 Pearson Education, Inc.
Slide 27-7
FIGURE 27.4
Simple hierarchy of soft drinks and chips.
Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition
Copyright © 2004 Pearson Education, Inc.
Slide 27-8
Association Rules
 Market-Basket Model, Support, and Confidence
 Apriori Algorithm
 Sampling Algorithm
 Frequent-Pattern Tree Algorithm
 Partition Algorithm
 Other Types of Association Rules
 Additional Considerations for Association Rules
Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition
Copyright © 2004 Pearson Education, Inc.
Slide 27-9
Classification
The process of learning a model that
describes different classes of data.
The classes are known in advance – the
rules that describe them are not.
Mining can help determine past influential
characteristics that can be used to predict
future behavior.
Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition
Copyright © 2004 Pearson Education, Inc.
Slide 27-10
FIGURE 27.5
Example decision tree for credit card applications.
Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition
Copyright © 2004 Pearson Education, Inc.
Slide 27-11
FIGURE 27.6
Sample training data for classification algorithm.
Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition
Copyright © 2004 Pearson Education, Inc.
Slide 27-12
FIGURE 27.7
Decision tree based on sample training data where the leaf
nodes are represented by a set of RIDs of the partitioned
records.
Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition
Copyright © 2004 Pearson Education, Inc.
Slide 27-13
Clustering
Another way of learning
Puts “similar” records into groups
– Reaction to medication
Similarity function is key
Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition
Copyright © 2004 Pearson Education, Inc.
Slide 27-14
FIGURE 27.8
Sample 2-dimensional records for clustering example (the
RID column is not considered).
Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition
Copyright © 2004 Pearson Education, Inc.
Slide 27-15
Approaches to Other Data
Mining Problems
Discovery of Sequential Patterns
Discovery of Patterns in Time Series
Regression
Neural Networks
Genetic Algorithm
Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition
Copyright © 2004 Pearson Education, Inc.
Slide 27-16
Applications of Data Mining
Marketing
Finance
Manufacturing
Health Care
Probably many other decision-making
contexts
Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition
Copyright © 2004 Pearson Education, Inc.
Slide 27-17
Commercial Data Mining Tools
Text lists several packages and their
strengths
Huge field as databases multiply
Big potential if you can come up with a way
of protecting privacy as well as correcting
data.
Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition
Copyright © 2004 Pearson Education, Inc.
Slide 27-18
Summary
Lots of potential in this field
Seems complex, but only because of the
sheer amount of data.
See Wikipedia at
– http://en.wikipedia.org/wiki/Data_mining
Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition
Copyright © 2004 Pearson Education, Inc.
Slide 27-19
Related documents