Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Web Mining, Midterm, Spring 2013 75 minutes Last name _____________ First name _______________ Score ____________ 1. (2pts) Given data sequences produced from the transaction database Customer ID Data sequence C1 <{a,d} {c,d} > C2 <{a,b,d} {a,c} {c,e}> C3 <{c}> C4 <{a} {d} {c}> C5 <{d} {c}> C6 <{a,c} {e} {d}> Compute the support of sequential pattern of <{a,c}> <{c} {d}> 2. (2pts) Assume we have the following association rules with min_sup = s and min_con = c: A=>C (s1, c1), B=>A (s2,c2), C=>B (s3,c3) Show the conditions that the association rule C=>A holds. 3. (2pts) Give two limitations of k-means algorithm. 4. (6pts) Consider the following statistics collected from a training dataset Gender F F M M Married N Y N Y number of tuples with good rating 5 0 15 30 number of tuples with bad rating 20 20 10 0 1) calculate the information gain when splitting on Gender (decision tree algorithm). Note that you can keep “log” in your results. 2) Given a new test tuple with Gender = F and Married= N, predict the class label based on the naïve bayes classifier. 5. (3pts)A document space is defined by three terms (t1, t2, t3). There are six documents in the collection: d1(2, 1, 1) d2(3, 5, 0) d3(10, 0, 3) d4(5, 1, 1) d5(5, 0, 2) d6(0, 3, 0) Show the final TF-IDF term vector for d2. (Note that you can keep “log” in your results and no need to calculate the final values) 6. (5pts) For the hyperlink graph shown below, 1) show the second column of the adjacency matrix A of the original graph (before any modification). 2) Show the second column of the stochastic matrix. P(i ) (1 d ) d 3) Based on stochastic matrix from step 2). n A ji P ( j) and assume d =0.85. Note that A is the j 1 And further assume at step k-1, the page rank score for nodes 1 to 5 are (0.2, 0.1, 0.1, 0.3, 0.2). Please show the page rank score of node 2 at step k. Please write the formula clearly and no need to calculate the final value.