Download Homework 4

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Principal component analysis wikipedia , lookup

K-means clustering wikipedia , lookup

K-nearest neighbors algorithm wikipedia , lookup

Nonlinear dimensionality reduction wikipedia , lookup

Expectation–maximization algorithm wikipedia , lookup

Transcript
Web Mining, Midterm, Spring 2013
75 minutes
Last name _____________
First name _______________
Score ____________
1. (2pts) Given data sequences produced from the transaction database
Customer ID
Data sequence
C1
<{a,d} {c,d} >
C2
<{a,b,d} {a,c} {c,e}>
C3
<{c}>
C4
<{a} {d} {c}>
C5
<{d} {c}>
C6
<{a,c} {e} {d}>
Compute the support of sequential pattern of
<{a,c}>
<{c} {d}>
2. (2pts) Assume we have the following association rules with min_sup = s and min_con = c:
A=>C (s1, c1),
B=>A (s2,c2), C=>B (s3,c3)
Show the conditions that the association rule C=>A holds.
3. (2pts) Give two limitations of k-means algorithm.
4. (6pts) Consider the following statistics collected from a training dataset
Gender
F
F
M
M
Married
N
Y
N
Y
number of tuples with good rating
5
0
15
30
number of tuples with bad rating
20
20
10
0
1) calculate the information gain when splitting on Gender (decision tree algorithm). Note that you can
keep “log” in your results.
2) Given a new test tuple with Gender = F and Married= N, predict the class label based on the naïve
bayes classifier.
5. (3pts)A document space is defined by three terms (t1, t2, t3). There are six documents in the collection:
d1(2, 1, 1)
d2(3, 5, 0)
d3(10, 0, 3)
d4(5, 1, 1)
d5(5, 0, 2)
d6(0, 3, 0)
Show the final TF-IDF term vector for d2. (Note that you can keep “log” in your results and no need to
calculate the final values)
6. (5pts) For the hyperlink graph shown below,
1) show the second column of the adjacency matrix A of the original graph (before any
modification).
2) Show the second column of the stochastic matrix.
P(i )  (1  d )  d
3) Based on
stochastic matrix from step 2).
n
A
ji P (
j)
and assume d =0.85. Note that A is the
j 1
And further assume at step k-1, the page rank score for nodes 1 to 5 are (0.2, 0.1, 0.1, 0.3, 0.2). Please show
the page rank score of node 2 at step k.
Please write the formula clearly and no need to calculate the final value.