Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Review: Data Mining n CS 341, Spring 2007 n n n n Lecture 5: Data Mining Techniques (II) -- decision trees, neural networks Jackknife estimation Maximum likelihood estimation EM Bayes Theorem Hypothesis Testing – ChiChi-squared test n n Regression and Correlation Similarity measures and distance measures © Prentice Hall Data Mining Techniques Outline (II) Twenty Questions Game Goal: Provide an overview of basic data n mining techniques Decision Trees n Neural Networks 2 n One person has in mind some object and another person tries to guess with no more than 20 questions. – Activation Functions © Prentice Hall 3 © Prentice Hall Twenty Questions Game 4 Decision Trees n Decision Tree (DT): – Tree where the root and each internal node is labeled with a question. – The arcs represent each possible answer to the associated question. – Each leaf node represents a prediction of a solution to the problem. n © Prentice Hall 5 Popular technique for classification; Leaf node indicates the class to which the corresponding tuple belongs. © Prentice Hall 6 1 Decision Tree Example n n Decision Tree Example Students in a particular university are to be classified as tall, medium and short based on their height. Assume the database scheme is {name, address, gender, height, age, year, major} How to construct a decision tree? – Identify important attributes. – Obtain training data (a sample of the database with known classification values.) – Outliers: untypical data, e.g. student who is 14 years old. © Prentice Hall 7 8 Decision Tree Algorithm: the use of a DT Decision Trees n © Prentice Hall A Decision Tree Model is a computational model consisting of three parts: – – – Decision Tree Algorithm to create the tree Algorithm that applies the tree to data n Creation of the tree is the most difficult part. n Processing is basically a search similar to that in a binary search tree – Most DT techniques differ is how the tree is built. – Complexity: branching factor, the height of the tree © Prentice Hall 9 Decision Tree Advantages n n n n © Prentice Hall Decision Tree Disadvantages Easy to understand. Easy to generate rules Provide a clear indication of which fields/attributes are most important for prediction or classification. Perform classification without requring much computation n May suffer from overfitting. overfitting. – Classification problems with many classes and small number of training examples. n n n © Prentice Hall 10 11 Does not easily handle nonnumeric data. Can be quite large, computationally expensive to train, pruning is necessary. Less appropriate for predicting the value of a continuous attribute. © Prentice Hall 12 2 Neural Networks n n n n n n Neural Networks Based on observed functioning of human brain. (Artificial Neural Networks (ANN) Our view of neural networks is very simplistic. We view a neural network (NN) from a graphical viewpoint. Alternatively, a NN may be viewed from the perspective of matrices. Used in pattern recognition, speech recognition, computer vision, and classification. © Prentice Hall 13 Neural Network Example © Prentice Hall n Neural Network (NN) is a directed graph F=<V,A> with vertices V={1,2,… V={1,2,…,n} and arcs A={<i,j>|1<=i,j<=n}, with the following restrictions: – V is partitioned into a set of input nodes, VI, hidden nodes, VH, and output nodes, VO. – The vertices are also partitioned into layers – Any arc <i,j> must have node i in layer hh-1 and node j in layer h. – Arc <i,j> is labeled with a numeric value wij. – Node i is labeled with a function fi. © Prentice Hall 14 NN Node 15 NN Activation Functions n n © Prentice Hall 16 NN Activation Functions Functions associated with nodes in graph. Output may be in range [[-1,1] or [0,1] © Prentice Hall 17 © Prentice Hall 18 3 NN Learning n n n Neural Networks Propagate input values through graph. Compare output to desired output. Adjust weights in graph accordingly. n n © Prentice Hall 19 © Prentice Hall NN Advantages n n n n n n n n n © Prentice Hall n n Classification – Issues in classification – Regression & Bayesian Classification Page 19: 6, 7 Page 45: 3 Page 70: 1, 2, 5, 6, 7 © Prentice Hall 22 Next lecture: (Due inin-class next Monday, your answers to the first 3 questions must be typed) typed) n Difficult to understand May suffer from overfitting Structure of graph must be determined a priori. Input values must be numeric. Verification difficult. 21 Homework Assignment 1: n 20 NN Disadvantages Learning Can continue learning even after training set has been applied. Easy parallelization Solves many problems © Prentice Hall A Neural Network Model is a computational model consisting of three parts: – Neural Network graph – Learning algorithm that indicates how learning takes place. – Recall techniques that determine how information is obtained from the network. We will look at propagation as the recall technique. n 23 Reading Assignments: Chapter 4.1, 4.2 © Prentice Hall 24 4