Download Review: Data Mining Core Techniques n Classification

Review: Data Mining n CS 341, Spring 2007 n Decision Trees Neural networks Lecture 6: Classification – issues, regression, bayesian classification © Prentice Hall Data Mining Core Techniques n Classification n Clustering n Association Rules © Prentice Hall Classification Outline Goal: Provide an overview of the classification problem and introduce some of the basic algorithms n n 3 © Prentice Hall Classification Outline 4 Classification Problem n Classification Problem Overview Classification Techniques – Regression – Bayesian classification n n © Prentice Hall Classification Problem Overview Classification Techniques – Regression – Bayesian classification – Distance – Decision Trees – Rules – Neural Networks Goal: Provide an overview of the classification problem and introduce some of the basic algorithms n n 2 5 Given a database D={t1,t2,…,tn} and a set of classes C={C1,…,Cm}, the Classification Problem is to define a mapping f:D C where each ti is assigned to one class. Actually divides D into equivalence classes. classes. Prediction is similar, but may be viewed as having infinite number of classes. © Prentice Hall 6 1 Classification Examples n n n n n n Classification Ex: Grading Teachers classify students’ students’ grades as A, B, C, D, or F. Identify mushrooms as poisonous or edible. Predict when a river will flood. Identify individuals with credit risks. Speech recognition Pattern recognition © Prentice Hall n n n n n If x >= 90 then grade =A. If 80<=x<90 then grade =B. If 70<=x<80 then grade =C. If 60<=x<70 then grade =D. If x<60 then grade =F. 7 Letter C Letter D Letter E Letter F n n © Prentice Hall >=90 x <80 x <70 x <50 F A >=80 B >=70 C >=60 D 8 Classification Techniques View letters as constructed from 5 components: Letter B <90 © Prentice Hall Classification Ex: Letter Recognition Letter A x n Approach: 1. Create specific model by evaluating training data (or using domain experts’ experts’ knowledge). 2. Apply model developed to new data. Classes must be predefined Most common techniques use DTs, NNs, NNs, or are based on distances or statistical methods. 9 © Prentice Hall Defining Classes 10 Issues in Classification n Distance Based Missing Data – Ignore – Replace with assumed value n Overfitting – Large set of training data – Filter out erroneous or noisy data n – – – Partitioning Based © Prentice Hall Measuring Performance 11 Classification accuracy on test data Confusion matrix OC Curve © Prentice Hall 12 2 Classification Accuracy n True positive (TP) n False positive (FP) n True negative (TN) n False negative (FN) Classification Performance – ti Predicted to be in Cj and is actually in it. True Positive False Negative False Positive True Negative – ti Predicted to be in Cj but is not actually in it. – ti not predicted to be in Cj and is not actually in it. – ti not predicted to be in Cj but is actually in it. © Prentice Hall 13 n n An m x m matrix Entry Ci,j indicates the number of tuples assigned to Cj, but where the correct class is Ci The best solution will only have nonnonzero values on the diagonal. © Prentice Hall 14 Height Example Data Confusion Matrix n © Prentice Hall 15 Confusion Matrix Example N am e K ris tin a J im M a g g ie M a rth a S te p h a n ie Bob K a th y D ave W o rth S te v e n D e b b ie Todd K im Amy W y n e tte G ender F M F F F M F M M M F M F F F H e ig h t 1 .6 m 2m 1 .9 m 1 .8 8 m 1 .7 m 1 .8 5 m 1 .6 m 1 .7 m 2 .2 m 2 .1 m 1 .8 m 1 .9 5 m 1 .9 m 1 .8 m 1 .7 5 m © Prentice Hall O u tp u t1 S h o rt T a ll M e d iu m M e d iu m S h o rt M e d iu m S h o rt S h o rt T a ll T a ll M e d iu m M e d iu m M e d iu m M e d iu m M e d iu m O u tp u t2 M e d iu m M e d iu m T a ll T a ll M e d iu m M e d iu m M e d iu m M e d iu m T a ll T a ll M e d iu m M e d iu m T a ll M e d iu m M e d iu m 16 Operating Characteristic Curve Using height data example with Output1 (correct) and Output2 (actual) assignment Actual Assignment Membership Short Medium Short 0 4 Medium 0 5 Tall 0 1 © Prentice Hall Tall 0 3 2 17 © Prentice Hall 18 3 Classification Outline Regression Goal: Provide an overview of the classification problem and introduce some of the basic algorithms n n Classification Problem Overview Classification Techniques – Regression – Distance – Decision Trees – Rules – Neural Networks © Prentice Hall n Assume data fits a predefined function n Determine best values for parameters in the model n Estimate an output value based on input values n Can be used for classification and prediction 19 © Prentice Hall Linear Regression n n n n Example: 4.3 Assume the relation of the output variable to the input variables is a linear function of some parameters. Determine best values for regression coefficients c0,c1,…,cn. Assume an error: y = c0+c1x1+…+cnxn+ε Estimate error using mean squared error for training set: n n n 21 Example: 4.4 n n n n Y = C0 + ε Find the value for c0 that best partition the height values into classes: short and medium The training data for yi is {1.6, 1.9, 1.88, 1.7, 1.85, 1.6, 1.7, 1.8, 1.95, 1.9, 1.8, 1.75} n © Prentice Hall 20 How ? © Prentice Hall 22 Linear Regression Poor Fit Y = c0 + c0 x1 + ε Find the value for c0 and c1 that best predict the class. Assume 0 for the short class, 1 for the medium class The training data for (xi, yi) is {(1.6,0), (1.9,0) , (1.88, 0), (1.7, 0), (1.85, 0), (1.6, 0), (1.7,0), (1.7,0), (1.8,0), (1.95, 0), (1.9, 0), (1.8, 0), (1.75, 0)} n How ? © Prentice Hall 23 © Prentice Hall 24 4 Division Classification Using Regression n n Division: Use regression function to divide area into regions. Prediction: Prediction: Use regression function to predict a class membership function. © Prentice Hall 25 © Prentice Hall Prediction 26 Logistic Regression n n n A generalized linear model Extensively used in the medical and social sciences It has the following form Loge (p /p -1) = c0 + c1x1 + … + ckxk p is the probability of being in the class, 1 – p is the probability that is not. The parameters c0, c1, … ck are usually estimated by maximum likelihood. (maximize the probability of observing the given value.) © Prentice Hall 27 28 Linear Regression vs. Logistic Regression Why Logistic Regression n © Prentice Hall P is in the range [0,1] – A good model would like to have p value close to 0 or 1 n n Linear function is not suitable for p Consider the odds p/1p/1-p. – As p increases, the odds (p/1(p/1-p) increases – The odds is in the range of [0, +∞ +∞], asymmetric. – The log odds lies in the range -∞ to +∞, symmetric. © Prentice Hall 29 © Prentice Hall 30 5 Classification Outline Bayes Theorem Goal: Provide an overview of the classification problem and introduce some of the basic algorithms n n n n Classification Problem Overview Classification Techniques n Posterior Probability: P(h1|xi) Prior Probability: P(h1) Bayes Theorem: – Regression – Bayesian classification n © Prentice Hall 31 © Prentice Hall n N am e K ris tin a J im M a g g ie M a rth a S te p h a n ie Bob K a th y D ave W o r th S te v e n D e b b ie Todd K im Amy W y n e tte Assume that the contribution by all attributes are independent and that each contributes equally to the classification problem. ti has m independent attributes {xi1,…, xim,}. P (t (ti | Cj) ∏ P (x (xik | Cj) © Prentice Hall G ender F M F F F M F M M M F M F F F 33 H e ig h t 1 .6 m 2m 1 .9 m 1 .8 8 m 1 .7 m 1 .8 5 m 1 .6 m 1 .7 m 2 .2 m 2 .1 m 1 .8 m 1 .9 5 m 1 .9 m 1 .8 m 1 .7 5 m O u tp u t1 S h o rt T a ll M e d iu m M e d iu m S h o rt M e d iu m S h o rt S h o rt T a ll T a ll M e d iu m M e d iu m M e d iu m M e d iu m M e d iu m O u t p u t2 M e d iu m M e d iu m T a ll T a ll M e d iu m M e d iu m M e d iu m M e d iu m T a ll T a ll M e d iu m M e d iu m T a ll M e d iu m M e d iu m © Prentice Hall Example 4.5 n 32 Example: using the output1 as classification results Naï Naïve Bayes Classification n Assign probabilities of hypotheses given a data value. 34 Example 4.5 Step1: Calculate the prior probability n Step1: Calculate the prior probability – P (short) = 4/15 = 0.267 – P (medium) = 8/15 = 0.533 – P (tall) = 3/15 = 0.2 – P (short) = – P (medium) = – P (tall) = n Step 2: Calculate the conditional probability – P(Genderi |Cj), Genderi = F or M, Cj = short or medium or tall – P(Heighti |Cj) Heighti in (0,1.6],(1.6,1.7],(1.7,1.8],(1.8,1.9],(1.9,2.0],(>2.0). © Prentice Hall 35 © Prentice Hall 36 6 Example 4.5 (cont’ (cont’d) Attribute Gender M F Height (<1.6] (1.6,1.7] (1.7,1.8] (1.8,1.9] (1.9,2.0] ( >2.0 ) count short medium tall 1 2 3 3 6 0 2 0 0 2 0 0 0 3 0 0 4 0 0 1 1 0 0 2 Example 4.5 (cont’ (cont’d) probability p(xi |Cj) short medium tall 1/4 2/8 3/4 6/8 0/3 2/4 0 0 2/4 0 0 0 3/8 0 4/8 n Given a tuple t ={Adam, M, 1.95m} Step 3: Calculate P(t|Cj) P(t|short) P(t|short) = P(t|medium) P(t|medium) = P(t|tall)= P(t|tall)= 0 n 0 0 1/8 1/3 0 0 2/3 © Prentice Hall n 3/3 Step 4: calculate P(t) P(t) P(t) P(t) = P(t|short)P(short)+P(t|medium)P(medium)+P(t|tall)P(tall) P(t|short)P(short)+P(t|medium)P(medium)+P(t|tall)P(tall) 37 © Prentice Hall Example 4.5 (cont’ (cont’d) n n Example 4.5 (cont’ (cont’d) Given a tuple t ={Adam, M, 1.95m} Step 3: Calculate P(t|Cj) n Step 5: Calculate P(Cj | t) using Bayes Rule P(short|t) P(short|t) = P(t|short)P(short)/P(t) P(t|short)P(short)/P(t) = P(medium|t) P(medium|t) = P(tall|t)= P(tall|t)= P(t|short) P(t|short) = ¼ x 0 =0 P(t|medium) P(t|medium) = 2/8 x 1/8 =0.031 P(t|tall)= P(t|tall)= 3/3 x1/3 =0.333 n 38 n Step 4: calculate P(t) P(t) Last step: – classify t based on these probabilities P(t) P(t) = P(t|short)P(short)+P(t|medium)P(medium)+P(t|tall)P(tall) P(t|short)P(short)+P(t|medium)P(medium)+P(t|tall)P(tall) = 0.0826 © Prentice Hall 39 © Prentice Hall Example 4.5 (cont’ (cont’d) n A Summary Step 5: Calculate P(Cj | t) using Bayes Rule P(short|t) P(short|t) = P(t|short)P(short)/P(t) P(t|short)P(short)/P(t) = 0 P(medium|t) P(medium|t) = 0.2 P(tall|t)= P(tall|t)= 0.799 n Last step: – Classify the new tuple as tall. © Prentice Hall 40 41 n Step 1: Calculate the prior probability of each class. P (C (Cj) n Step 2: Calculate the conditional probability for each attribute value, P(Genderi |Cj), n Step 3: Calculate the conditional probability P(t|Cj) n Step 4: calculate the prior probability of a tuple, tuple, P(t) P(t) n Step 5: Calculate the posterior probability for each class given the tuple, tuple, P(Cj | t) using Bayes Rule n Step 6: Classify a tuple based on the P(Cj | t), the tuple belongs to the class with has the highest posterior probability. © Prentice Hall 42 7 Next Lecture: n Classification: – DistanceDistance-based algorithms – Decision treetree-based algorithms n HW2 will be announced! © Prentice Hall 43 8

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download Review: Data Mining Core Techniques n Classification