Download Data Mining CS 541

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Cluster analysis wikipedia , lookup

Nonlinear dimensionality reduction wikipedia , lookup

Transcript
By Dan Stalloch
 Association – what could be linked together in away
with something
 Patterns – sequential and time series, shows us how
often certain things occur
 Classification – shows us how data is grouped
 Prediction – the detection of a stable occurrence
within the data that may continue into the future
 Identification – what can be found out by system usage
or what might be present in a thing
 Classification – how the data could be grouped
 Optimization – finding ways to utilize resources
 Apriori – frequent large item sets
 Sampling – small frequent item sets
 Frequent-Pattern (FP) Tree and FP-Growth – better




version of Apriori
Partition – efficient way to use the Apriori algorithm
Decision Tree Induction – constructing a decision tree
from a training data set
k-Means – creates clustering
And others
 Marketing – analyzing customer behavior
 Finance – keeping track of credit and fraud
 Manufacturing – optimizing use of resources
 Health Care – checking patterns for useful information
 http://archive.ics.uci.edu/ml/machine-learning-
databases/auto-mpg/auto-mpg.data
 This is a Car database from a depository of databases
made available to everyone through UCI
 When mining a database it is essential to ask what
would you like to be able to predict from it and in this
instance we would like to know which cars have decent
mpg
 We might also be able to predict which companies are
likely to stay in business
 We must create or use programs that shows us either a
2-D contingency table or a 3-D contingency table
http://www.autonl
ab.org/tutorials/dt
ree18.pdf
 We use a formula to decide which areas have the
highest information gain dependent on what we would
like to know. That forumula goes
 like this
 IG(Y|X) = H(Y) - H(Y | X)
 Where H(X) = the entropy of X
 http://www.autonlab.org/tutorials/dtree18.pdf
 http://archive.ics.uci.edu/ml/machine-learning-
databases/auto-mpg/auto-mpg.data
 http://www.autonlab.org/tutorials/infogain11.pdf
 Chapter 28 from Fundamentals of Database Systems
6th Edition By Elmasri and Navathe
 Pictures from Andrew W. Moore Slides