Download 2 - UIC Computer Science

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Nonlinear dimensionality reduction wikipedia , lookup

Cluster analysis wikipedia , lookup

Expectation–maximization algorithm wikipedia , lookup

K-means clustering wikipedia , lookup

K-nearest neighbors algorithm wikipedia , lookup

Transcript
1.
(a). Name three classification techniques. No need to explain how they work.
(b) (3%) How do you describe overfitting in classification?
(c) (3%) Given the following decision tree, generate all the rules from the tree. Note that
we have two classes, Yes and No.
Age
>= 40
< 40
Sex
M
F
No
>=50k
Yes
job
y
income
Yes
<50k
No
n
Yes
(d) List three objective interestingness measures of rules, and list two subjective
interestingness measures of rules. No need to explain.
(e) (5) To build a naïve Bayesian classifier, we can make use of association rule mining.
How to compute P(Ai = aj | C= ck) from association rules, where Ai is an attribute and aj
is a value of Ai, and ck is a class value of the class attribute C?
2. (10%) Given the following table with three attributes, a1, a2, and a3:
a1
a2
a3
C
B
H
B
F
S
A
F
F
C
B
H
B
F
G
B
E
O
We want to mine all the large (or frequent) itemsets in the data. Assume the minimum
support is 30%. Following the Apriori algorithm, give the set of large itemsets in L1, L2, ….,
and candidate itemsets in C2, C3, …. (after the join step and the prune step). What additional
pruning can be done in candidate generation and how?
3. (10%) In the multiple minimum support association rule mining, we can assign a minimum
support to each item, called minimum item support (MIS). We define that an itemset, {item1,
item2, …}, is large (or frequent) if its support is greater than or equal to
min(MIS(item1), MIS(item2), …..)
Given the transaction data:
{Beef, Bread}
{Bread, Cloth}
{Bread, Cloth, Milk}
{Cheese, Boots}
{Beef, Bread, Cheese, Shoes}
{Beef, Bread, Cheese, Milk}
{Bread, Milk, Cloth}
If we have the following minimum item support assignments for the items in the transaction
data,
MIS(Milk) = 50%,
MIS(Bread) = 70%
The MIS values for the rest of the items in the data are all 25%.
Following the MSapriori algorithm, give the set of large (or frequent) itemsets in L1, L2, ….?
4.
(10%) Given the following training data, which has two attributes A and
compute all the probability values required to build a naïve bayesian
smoothing.
Answer:
A
B
P(C = y) =
P(C= n) =
m
t
P(A=m | C=y) =
m
s
P(A=g | C=y) =
g
q
P(A=h | C=y) =
h
s
g
q
P(A=m | C=n) =
g
q
P(A=g | C=n) =
g
s
P(A=h | C=n) =
h
t
P(B=t | C=y) =
h
q
P(B=s | C=y) =
m
t
P(B=q | C=y) =
B, and a class C,
classifier. Ignore
C
y
y
y
y
y
n
n
n
n
n
P(B=t | C=n) =
P(B=s | C=n) =
P(B=q | C=n) =
5.
Using agglomerative clustering to cluster the following one dimensional data: 1, 2, 4, 6, 9,
11, 20, 23, 27, 30, 34, 100, 120, 130. You are required to draw the cluster tree and write the
value of the cluster center represented by each node next to the node.
36.9
15.2
5.5
26.8
30.3
3.25
5
1.5
1
6.
2
4
10
6
9
21.5
11
20
116.7
28.5
23
27
125
30
34
110
3
120
130
3
Given the following positive and negative data points, draw a possible decision tree partition
and a possible SVM decision surface respectively.
Draw a SVM decision surface
Draw a possible decision tree
partition
7. In a marketing application, a predictive model is built to score a test database to identify
likely customers. After scoring, the following configuration for 10 bins is obtained. Each
number of the second row is the number of positive cases in the test data that fall into the
corresponding bin. Draw the lift chart for the results. Your drawing should be reasonably
accurate.
Bin 1
240
Bin 2
120
Bin 3
40
Bin 4
30
Bin 5
20
Bin 6
20
Bin 7
10
Bin 8
8
Bin 9
6
Bin 10
6
8.
Given the classification results in the following confusion matrix, compute the classification
accuracy, precision, and recall scores of the positive data.
Classified as
Positive
Negative
9.
Correct
50
10
Positive
5
200
Negative
Given the following table with three attributes, a1, a2, and a3:
a1
a2
a3
C
B
H
B
F
S
A
F
F
C
B
H
B
F
G
B
E
O
we want to mine all the large (or frequent) itemsets using the multiple minimum support
technique. If we have the following minimum item support assignments for the items,
MIS(a2=F) = 60%,
The MIS values for the rest of the items in the data are all 30%.
Following the MSapriori algorithm, give the set of large (or frequent) itemsets in L1, L2, ….
and candidate itemsets in C2, C3, … (after the join step and the prune step)?