Download Data Mining

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Nonlinear dimensionality reduction wikipedia , lookup

Nearest-neighbor chain algorithm wikipedia , lookup

K-means clustering wikipedia , lookup

Cluster analysis wikipedia , lookup

Transcript
Data Mining
Tri Nguyen
Agenda





Data Mining As Part of KDD
Decision Tree
Association Rules
Clustering
Amazon Data Mining Examples
Data Mining and KDD
Putting the results
in practical use
What is Data Mining?

“the automated extraction of hidden
predictive information from large databases”

Algorithms produce patterns, rules

Predict future trends/behavior

Used to make business decisions
Classification



Items belong to classes
Given past items’ classification, predict class
of new item
Example: Issuing credit cards


Use information: income, educational background,
age, current debts
Credit worthiness: Bad, good, excellent
Decision Tree Classifiers



Internal Node has predicate
Leaf node is class
To classify instance



Start at root node
Traverse tree until reach leaf node
Each internal node, make decision
Credit Risk Decision Tree
Decision Tree Construction

Some Definitions


Purity: > # instances of each leaf
belonging to only 1 class means > purity
Best Split: split giving the maximum
information gain ratio (info gain/info
content)

Choose attribute and condition resulting in
maximum purity
Decision Tree Construction
Association Rules

antecedent  consequent





if  then
beer  diaper (Walmart)
economy bad  higher unemployment
Higher unemployment  higher unemployment
benefits cost
Rules associated with population,
support, confidence
Association Rules


Population: instances such as grocery store
purchases
Support


% of population satisfying antecedent and
consequent
Confidence

% consequent true when antecedent true
Association Rules

Population



Support (MS)= 3/6


MS, MSA, MSB, MA, MB, BA
M=Milk, S=Soda, A=Apple, B=beer
(MS,MSA,MSB)/(MS,MSA,MSB,MA,MB, BA)
Confidence (MS) = 3/5

(MS, MSA, MSB) / (MS,MSA,MSB,MA,MB)
Clustering

“The process of dividing a dataset into
mutually exclusive groups such that the
members of each group are as "close"
as possible to one another, and different
groups are as "far" as possible from one
another, where distance is measured
with respect to all available variables.”
Clustering




Birch Algorithm
points inserted into multidimensional
tree
items guided to leaf nodes "near"
representative internal nodes
nearby points clustered into one leaf
node
Clustering


Example of Clustering
predict what new movies a person is
interested in



1) a person’s past movie preferences
2) others with similar preferences
3) preferences of those in the pool for new
movies
Clustering



1) cluster people with similar movie
preferences
2) given a new movie goer, find a
cluster of similar movie goers
3) then predict the cluster's new movie
preferences
Amazon Examples
Amazon Examples
Amazon Examples
Amazon Examples
References



http://www.thearling.com/text/dmwhite/dmwhite.htm
http://www.cse.ohio-state.edu/~srini/694Z/part1.ppt
http://www-aig.jpl.nasa.gov/public/kdd95/tutorials/IJCAI95tutorial.html