Download DataMining-techniques1

Data Mining CS 341, Spring 2007 Lecture 4: Data Mining Techniques (I) Review:  Information Retrieval – Similarity measures – Evaluation Metrics : Precision and Recall Question Answering  Web Search Engine  – An application of IR – Related to web mining © Prentice Hall 2 Data Mining Techniques Outline Goal: Provide an overview of basic data mining techniques  Statistical – Point Estimation – Models Based on Summarization – Bayes Theorem – Hypothesis Testing – Regression and Correlation  Similarity Measures © Prentice Hall 3 Point Estimation     Point Estimate: estimate a population parameter. May be made by calculating the parameter for a sample. May be used to predict value for missing data. Ex: – – – – R contains 100 employees 99 have salary information Mean salary of these is $50,000 Use $50,000 as value of remaining employee’s salary. Is this a good idea? © Prentice Hall 4 Estimation Error  Bias: Difference between expected value and actual value.  Mean Squared Error (MSE): expected value of the squared difference between the estimate and the actual value:  Root Mean Square Error (RMSE) © Prentice Hall 5 Jackknife Estimate Jackknife Estimate: estimate of parameter is obtained by omitting one value from the set of observed values.  Named to describe a “handy and useful tool”  Used to reduce bias  Property: The Jackknife estimator lowers the bias from the order of 1/n to 1/n2  © Prentice Hall 6 Jackknife Estimate  Definition: – Divide the sample size n into g groups of size m each, so n=mg. (often m=1 and g=n) – estimate j by ignoring the jth group. – _ is the average of j . – The Jackknife estimator is » Q = g– (g-1)_. Where  is an estimator for the parameter theta. © Prentice Hall 7 Jackknife Estimator: Example 1      Estimate of mean for X={x1, x2, x3,}, n =3, g=3, m=1,  =  = (x1+ x2+ x3)/3 1 = (x2 + x3)/2, 2 = (x1 + x3)/2, 1 = (x1 + x2)/2, _ = (1 + 2 + 2)/3 Q = g-(g-1) _= 3-(3-1) _= (x1 + x2 + x3)/3 In this case, the Jackknife Estimator is the same as the usual estimator. © Prentice Hall 8 Jackknife Estimator: Example 2       Estimate of variance for X={1, 4, 4}, n =3, g=3, m=1,  = 2 2 = ((1-3)2 +(4-3)2 +(4-3)2 )/3 = 2 1 = ((4-4)2 + (4-4)2 ) /2 = 0, 2 = 2.25 , 3 = 2.25 _ = (1 + 2 + 2)/3 = 1.5 Q = g-(g-1) _= 3-(3-1) _ =3(2)-2(1.5)=3  In this case, the Jackknife Estimator is different from the usual estimator. © Prentice Hall 9 Jackknife Estimator: Example 2(cont’d)  In general, apply the Jackknife technique to the biased estimator 2 2 =  (xi – x )2 / n  then the jackknife estimator is s2 s2 =  (xi – x )2 / (n -1) Which is known to be unbiased for 2 © Prentice Hall 10 Maximum Likelihood Estimate (MLE)    Obtain parameter estimates that maximize the probability that the sample data occurs for the specific model. Joint probability for observing the sample data by multiplying the individual probabilities. Likelihood function: Maximize L. © Prentice Hall 11 MLE Example  Coin toss five times: {H,H,H,H,T}  Assuming a perfect coin with H and T equally likely, the likelihood of this sequence is:  However if the probability of a H is 0.8 then: © Prentice Hall 12 MLE Example (cont’d)  General likelihood formula:  Estimate for p is then 4/5 = 0.8 © Prentice Hall 13 Expectation-Maximization (EM) Solves estimation with incomplete data.  Obtain initial estimates for parameters.  Iteratively use estimates for missing data and continue until convergence.  © Prentice Hall 14 EM Example © Prentice Hall 15 EM Algorithm © Prentice Hall 16 Models Based on Summarization  Basic concepts to provide an abstraction and summarization of the data as a whole. – Statistical concepts: mean, variance, median, mode, etc.  Visualization: display the structure of the data graphically. – Line graphs, Pie charts, Histograms, Scatter plots, Hierarchical graphs © Prentice Hall 17 Scatter Diagram © Prentice Hall 18 Bayes Theorem     Posterior Probability: P(h1|xi) Prior Probability: P(h1) Bayes Theorem: Assign probabilities of hypotheses given a data value. © Prentice Hall 19 Bayes Theorem Example   Credit authorizations (hypotheses): h1=authorize purchase, h2 = authorize after further identification, h3=do not authorize, h4= do not authorize but contact police Assign twelve data values for all combinations of credit and income: 1 Excellent Good Bad  x1 x5 x9 2 3 4 x2 x6 x10 x3 x7 x11 x4 x8 x12 From training data: P(h1) = 60%; P(h2)=20%; P(h3)=10%; P(h4)=10%. © Prentice Hall 20 Bayes Example(cont’d)  Training Data: ID 1 2 3 4 5 6 7 8 9 10 Income 4 3 2 3 4 2 3 2 3 1 Credit Excellent Good Excellent Good Good Excellent Bad Bad Bad Bad © Prentice Hall Class h1 h1 h1 h1 h1 h1 h2 h2 h3 h4 xi x4 x7 x2 x7 x8 x2 x11 x10 x11 x9 21 Bayes Example(cont’d)    Calculate P(xi|hj) and P(xi) Ex: P(x7|h1)=2/6; P(x4|h1)=1/6; P(x2|h1)=2/6; P(x8|h1)=1/6; P(xi|h1)=0 for all other xi. Predict the class for x4: – Calculate P(hj|x4) for all hj. – Place x4 in class with largest value. – Ex: »P(h1|x4)=(P(x4|h1)(P(h1))/P(x4) =(1/6)(0.6)/0.1=1. »x4 in class h1. © Prentice Hall 22 Hypothesis Testing Find model to explain behavior by creating and then testing a hypothesis about the data.  Exact opposite of usual DM approach.  H0 – Null hypothesis; Hypothesis to be tested.  H1 – Alternative hypothesis  © Prentice Hall 23 Chi-Square Test      One technique to perform hypothesis testing Used to test the association between two observed variable values and determine if a set of observed values is statistically different. The chi-squared statistic is defines as: O – observed value E – Expected value based on hypothesis. © Prentice Hall 24 Chi-Square Test   Given the average scores of five schools. Determine whether the difference is statistically significant. Ex: – O={50,93,67,78,87} – E=75 – c2=15.55 and therefore significant  Examine a chi-squared significance table. – with a degree of 4 and a significance level of 95%, the critical value is 9.488. Thus the variance between the schools’ scores and the expected value cannot be associated with pure chance. © Prentice Hall 25 Regression    Predict future values based on past values Fitting a set of points to a curve Linear Regression assumes linear relationship exists. y = c0 + c 1 x1 + … + c n xn – n input variables, (called regressors or predictors) – One out put variable, called response – n+1 constants, chosen during the modlong process to match the input examples © Prentice Hall 26 Linear Regression -- with one input value © Prentice Hall 27 Correlation Examine the degree to which the values for two variables behave similarly.  Correlation coefficient r:  • 1 = perfect correlation • -1 = perfect but opposite correlation • 0 = no correlation © Prentice Hall 28 Correlation   Where X, Y are means for X and Y respectively. Suppose X=(1,3,5,7,9) and Y=(9,7,5,3,1) r=?  Suppose X=(1,3,5,7,9) and Y=(2,4,6,8,10) r=? © Prentice Hall 29 Similarity Measures    Determine similarity between two objects. Similarity characteristics: Alternatively, distance measure measure how unlike or dissimilar objects are. © Prentice Hall 30 Similarity Measures © Prentice Hall 31 Distance Measures  Measure dissimilarity between objects © Prentice Hall 32 Next Lecture:  Data Mining techniques (II) – Decision trees, neural networks and genetic algorithms  Reading assignments: Chapter 3 © Prentice Hall 33

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download DataMining-techniques1