Download Document

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
SI 654
Database Application Design
Winter 2003
Dragomir R. Radev
1
© 2002 by Prentice Hall
Data Mining
(continued)
2
© 2002 by Prentice Hall
arff files
@relation weather
@data
sunny,85,85,FALSE,no
@attribute outlook {sunny, overcast, rainy}
sunny,80,90,TRUE,no
@attribute temperature real
overcast,83,86,FALSE,yes
@attribute humidity real
rainy,70,96,FALSE,yes
@attribute windy {TRUE, FALSE}
rainy,68,80,FALSE,yes
@attribute play {yes, no}
rainy,65,70,TRUE,no
overcast,64,65,TRUE,yes
sunny,72,95,FALSE,no
sunny,69,70,FALSE,yes
rainy,75,80,FALSE,yes
sunny,75,70,TRUE,yes
overcast,72,90,TRUE,yes
overcast,81,75,FALSE,yes
rainy,71,91,TRUE,no
3
© 2002 by Prentice Hall
Predictive models
• Inputs (e.g., medical history, age)
• Output (e.g., will patient experience
any side effects)
• Some models are better than others
4
© 2002 by Prentice Hall
Operating curves
success
optimal
practical
random
failure
most likely
5
least likely
© 2002 by Prentice Hall
Principles of data mining
• Training/test sets
• Error analysis and overfitting
error
test
training
input size
• Cross-validation
• Supervised vs. unsupervised methods
6
© 2002 by Prentice Hall
Representing data
• Vector space
credit
pay off
default
salary
7
© 2002 by Prentice Hall
Decision surfaces
credit
pay off
default
salary
8
© 2002 by Prentice Hall
Decision trees
credit
pay off
default
salary
9
© 2002 by Prentice Hall
Linear boundary
credit
pay off
default
salary
10
© 2002 by Prentice Hall
kNN models
• Assign each element to the closest
cluster
• Demos:
– http://www2.cs.cmu.edu/~zhuxj/courseproject
/knndemo/KNN.html
11
© 2002 by Prentice Hall
Other methods
•
•
•
•
12
Decision trees
Neural networks
Support vector machines
Demos
– http://www.cs.technion.ac.il/~rani/
LocBoost/
© 2002 by Prentice Hall
arff files
@relation weather
@data
sunny,85,85,FALSE,no
@attribute outlook {sunny, overcast, rainy}
sunny,80,90,TRUE,no
@attribute temperature real
overcast,83,86,FALSE,yes
@attribute humidity real
rainy,70,96,FALSE,yes
@attribute windy {TRUE, FALSE}
rainy,68,80,FALSE,yes
@attribute play {yes, no}
rainy,65,70,TRUE,no
overcast,64,65,TRUE,yes
sunny,72,95,FALSE,no
sunny,69,70,FALSE,yes
rainy,75,80,FALSE,yes
sunny,75,70,TRUE,yes
overcast,72,90,TRUE,yes
overcast,81,75,FALSE,yes
rainy,71,91,TRUE,no
13
© 2002 by Prentice Hall
Weka
http://www.cs.waikato.ac.nz/ml/weka
Methods:
rules.ZeroR
bayes.NaiveBayes
trees.j48.J48
lazy.IBk
trees.DecisionStump
14
© 2002 by Prentice Hall
kMeans clustering
• http://www.cs.mcgill.ca/~bonnef/project.h
tml
• http://www.cs.washington.edu/research/im
agedatabase/demo/kmcluster/
• http://www2.cs.cmu.edu/~dellaert/software/
• java weka.clusterers.SimpleKMeans -t
data/weather.arff
15
© 2002 by Prentice Hall
More useful pointers
• http://www.kdnuggets.com/
• http://www.twocrows.com/booklet.ht
m
16
© 2002 by Prentice Hall
Related documents