Download Review: Twenty Questions Game Twenty Questions Game Decision

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Review:
Data Mining
n
CS 341, Spring 2007
n
n
n
n
Lecture 5: Data Mining Techniques (II)
-- decision trees, neural networks
Jackknife estimation
Maximum likelihood estimation
EM
Bayes Theorem
Hypothesis Testing
– ChiChi-squared test
n
n
Regression and Correlation
Similarity measures and distance measures
© Prentice Hall
Data Mining Techniques Outline (II)
Twenty Questions Game
Goal: Provide an overview of basic data
n
mining techniques
Decision Trees
n
Neural Networks
2
n
One person has in mind some object
and another person tries to guess with
no more than 20 questions.
– Activation Functions
© Prentice Hall
3
© Prentice Hall
Twenty Questions Game
4
Decision Trees
n
Decision Tree (DT):
– Tree where the root and each internal node is
labeled with a question.
– The arcs represent each possible answer to
the associated question.
– Each leaf node represents a prediction of a
solution to the problem.
n
© Prentice Hall
5
Popular technique for classification; Leaf
node indicates the class to which the
corresponding tuple belongs.
© Prentice Hall
6
1
Decision Tree Example
n
n
Decision Tree Example
Students in a particular university are to be
classified as tall, medium and short based on
their height. Assume the database scheme is
{name, address, gender, height, age, year,
major}
How to construct a decision tree?
– Identify important attributes.
– Obtain training data (a sample of the database
with known classification values.)
– Outliers: untypical data, e.g. student who is 14
years old.
© Prentice Hall
7
8
Decision Tree Algorithm: the use of a DT
Decision Trees
n
© Prentice Hall
A Decision Tree Model is a computational
model consisting of three parts:
–
–
–
Decision Tree
Algorithm to create the tree
Algorithm that applies the tree to data
n
Creation of the tree is the most difficult part.
n
Processing is basically a search similar to
that in a binary search tree
– Most DT techniques differ is how the tree is built.
– Complexity: branching factor, the height of the tree
© Prentice Hall
9
Decision Tree Advantages
n
n
n
n
© Prentice Hall
Decision Tree Disadvantages
Easy to understand.
Easy to generate rules
Provide a clear indication of which
fields/attributes are most important for
prediction or classification.
Perform classification without requring
much computation
n
May suffer from overfitting.
overfitting.
– Classification problems with many classes
and small number of training examples.
n
n
n
© Prentice Hall
10
11
Does not easily handle nonnumeric
data.
Can be quite large, computationally
expensive to train, pruning is necessary.
Less appropriate for predicting the value
of a continuous attribute.
© Prentice Hall
12
2
Neural Networks
n
n
n
n
n
n
Neural Networks
Based on observed functioning of human
brain.
(Artificial Neural Networks (ANN)
Our view of neural networks is very
simplistic.
We view a neural network (NN) from a
graphical viewpoint.
Alternatively, a NN may be viewed from
the perspective of matrices.
Used in pattern recognition, speech
recognition, computer vision, and
classification.
© Prentice Hall
13
Neural Network Example
© Prentice Hall
n
Neural Network (NN) is a directed graph
F=<V,A> with vertices V={1,2,…
V={1,2,…,n} and arcs
A={<i,j>|1<=i,j<=n}, with the following
restrictions:
– V is partitioned into a set of input nodes, VI,
hidden nodes, VH, and output nodes, VO.
– The vertices are also partitioned into layers
– Any arc <i,j> must have node i in layer hh-1
and node j in layer h.
– Arc <i,j> is labeled with a numeric value wij.
– Node i is labeled with a function fi.
© Prentice Hall
14
NN Node
15
NN Activation Functions
n
n
© Prentice Hall
16
NN Activation Functions
Functions associated with nodes in
graph.
Output may be in range [[-1,1] or [0,1]
© Prentice Hall
17
© Prentice Hall
18
3
NN Learning
n
n
n
Neural Networks
Propagate input values through graph.
Compare output to desired output.
Adjust weights in graph accordingly.
n
n
© Prentice Hall
19
© Prentice Hall
NN Advantages
n
n
n
n
n
n
n
n
n
© Prentice Hall
n
n
Classification
– Issues in classification
– Regression & Bayesian Classification
Page 19: 6, 7
Page 45: 3
Page 70: 1, 2, 5, 6, 7
© Prentice Hall
22
Next lecture:
(Due inin-class next Monday, your answers
to the first 3 questions must be typed)
typed)
n
Difficult to understand
May suffer from overfitting
Structure of graph must be determined
a priori.
Input values must be numeric.
Verification difficult.
21
Homework Assignment 1:
n
20
NN Disadvantages
Learning
Can continue learning even after
training set has been applied.
Easy parallelization
Solves many problems
© Prentice Hall
A Neural Network Model is a computational
model consisting of three parts:
– Neural Network graph
– Learning algorithm that indicates how
learning takes place.
– Recall techniques that determine how
information is obtained from the network.
We will look at propagation as the recall
technique.
n
23
Reading Assignments: Chapter 4.1, 4.2
© Prentice Hall
24
4