Download Example Questions for Final Exam All questions in all Course

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Example Questions for Final Exam
All questions in all Course Studies (Course Study 1 – Course Study 7)
All questions in “Example Questions for Midterm Exam” document.
Part 1: Data Preparation before Data Mining
Part 2: Association Rule Mining
Part 3: Sequential Pattern Mining
Part 4: Classification Techniques in Data Mining (Bayes, Decision Tree, K-Nearest Neighbor)
Part 4: Classification Techniques in Data Mining (Neural Network)
1- Write a program code in any programming language to calculate the output of the neuron for classification.
I j   wij Oi   j
Oj 
i
1
1 e
I j
w1
x1
x2
y
w2
bias
For example: Input vector X=<1, 0, 1> weight1 = 0.2 weight2 = 0.4
weight3 = -0.5
wƟ=0.6 Ɵ = -0.4
0.6
-0.4
2- Train a perceptron only one step using the following activation function and parameter values.
T=Desired output
O=Actual output
T=0, O=1, W1=0.5, W2=0.3, Input1=2, Input2=1, W =-1, Ɵ = 1
Activation Function
 


1 : 
w i I i     0
O 


i

 

0
:
otherwise



w i (t  1)  w i (t )  w i (t )
w i (t )  (T  O)I i
3- Train a perceptron (until no error) to recognize logical AND using Step Function with threshold 0.2 and learning rate 0.1.
error e = Yexpected – Yactual
α is the learning rate
xi is an input
wi = wi + Δwi
Δwi = α * xi * e
4- In order to train neural network for the following dataset, how many input(s), weight(s) and output(s) are needed?
Department
Sales
Systems
Systems
Marketing
Age
30's
20's
40+
40+
Status
Junior
Junior
Senior
Senior
Sales Count
5
30
56
95
5- Solve the XOR problem using Unit Step activation function and following initial parameters.
Part 5: Clustering
6- Given three binary bit strings 010111, 101001, 111100, compute the dissimilarity matrix (distance matrix) for these three points
using the Jaccard distance metric.
7- Car-A and Car-B can be grouped together in the same cluster or not? Why?
Aspiration
(Std ?)
Yes
Yes
Car A
Car B
Fuel-type
(Diesel ?)
No
Yes
Num-of-doors
(4 ?)
No
No
Engine-location
(front ?)
No
Yes
Drive-wheels
(fwd ?)
Yes
No
Body-style
(sedan ?)
Yes
No
aspiration: std, turbo.
fuel-type: diesel, gas.
num-of-doors: four, two.
engine-location: front, rear.
drive-wheels: fwd, rwd.
body-style: sedan, hatchback
8- Given the 5 pts below show all steps for a 2-means clustering algorithm (k=2) where the initial mean estimates are (1.67, 0.67) &
(3.25, 1)
<2.1, 3.4>, <1.8, 3.4>, <0.7, 2.0>, <3.4, 2.0>, <3.6, 2.0>
Run the algorithm until the clusters do not change.
9- For the following data, find the cluster of sample <1,1>. (Use Euclidean distance)
Sample
1
2
3
X
1
2
1
Y
3
0
1
Cluster
1
2
?
10- Circle the clusters and draw dentograms for the following data points. Merge clusters based on three criteria: single link, complete
link, and average link.
11-Using k-means with k=2, calculate the first clusters for the data below. Table gives distances between pairs of points. Circle the
two clusters. Assume the initial cluster points are: (31,32) and (34,24).
11,6
11,6
0
11,38
12.6
15,18
35.2
20,40
22.8
25,24
15.1
26,8
32.8
31,32
32.8
34,24
29.2
40,41
45.4
43.47
52.0
11,38
15,18
20.40
24,24
26,8
31,32
34,24
40,41
43,47
32.0
12.6
35.2
22.8
15.1
32.8
29.2
45.4
52.0
0
20.3
9.2
19.8
33.5
20.9
26.9
29.1
33.2
20.4
0
22.6
11.7
14.9
21.2
20.0
34.0
40.3
9.2
22.5
0
16.8
32.6
13.6
21.3
20.0
24.0
19.8
11.7
16.8
0
16.3
10
9
22.7
29.2
33.5
14.9
32.6
16.0
0
24.5
17.9
35.8
42.5
20.9
21.2
13.6
10.0
24.5
0
8.5
12.7
19.2
26.9
19.9
21.3
9.0
17.9
8.5
0
18.0
24.7
29.2
34.0
20.0
22.7
35.8
12.7
18.0
0
6.7
33.2
40.3
24.0
29.2
42.5
19.2
24.7
6.7
0
Part 6: Outlier Detection
12- Network intrusion detection by using outlier detection methods is one of the well known data mining applications. Give an
example dataset which have the following columns:
Start Time
Src IP
Src Port
Dst IP
Dst Port
P
F
Service
Bytes
Attack or Normal
00:00:11.380
164.107.1.2
1026
205.188.2 54.195
4000
17
0
domain
56
ATTACK
00:00:11.384
216.65.138.227
1055
164.10 7.1.3
28001
17
0
snmptrap
36
NORMAL
00:00:11.384
164.107.1.3
28001
216.65.1 38.227
1055
17
0
N/A
68
NORMAL
....
....
....
.....
....
...
...
..
...
..
Then find the association rule(s) for detecting attacks with a minimum support value such that
Src IP=2006.163.27.95, Dest Port=139, Bytes Є [150, 200]  ATTACK (support %55)
Show the extraction of the association rule(s) step by step.
Part 7: Web Mining
13- Web Usage Mining - Write an example dataset and then extract some association rules related with web usage patterns.
For example:
o 60% of users who placed an online order in /company/product2, were in the 20-25 age group and lived on the West Coast
14 - We have a Web Log File in ECLF Format. Write a program code in any programming language for the following web mining
operation:
“Which web page was mostly called from which web page”.
For Example:
IP Address
Time/Date
Method/URI
Referrer
Agent
202.120.224.4
15:30:01/2-Jan-01
GET Index.htm
http://ok.edu/link.htm
Mozilla/4.0(IE5.0W98)
202.120.224.4
15:30:01/2-Jan-01
GET 1.htm
http://ex.edu/index.htm
Mozilla/4.0(IE5.0W98)
202.120.224.4
15:30:01/2-Jan-01
GET A.htm
http://ex.edu/index.htm
Mozilla/4.0(IE5.0W98)
202.120.224.4
15:33:04/2-Jan-01
GET Index.htm
http://ok.edu/res.php
Mozilla/4.0(IE4.0NT)
202.120.224.4
15:33:04/2-Jan-01
GET 1.htm
http://ex.edu/index.htm
Mozilla/4.0(IE4.0NT)
202.120.224.4
15:33:04/2-Jan-01
GET A.htm
http://ox.edu/index.htm
Mozilla/4.0(IE4.0NT)
202.120.224.4
15:35:11/2-Jan-01
GET 1.htm
http://ok.edu/A.htm
Mozilla/4.0(IE5.0W98)
202.120.224.4
15:35:11/2-Jan-01
GET B.htm
http://ex.edu/A.htm
Mozilla/4.0(IE4.0NT)
202.120.224.4
15:37:09/2-Jan-01
GET A.htm
http://ox.edu/C.htm
Mozilla/4.0(IE5.0W98)
15- Write an example for each web mining categories.
Part 8: Text Mining
16- Write the output text after we apply feature generation step of Text Mining Process.
A computer is a machine that manipulates data according to a list of instructions. The first devices that resemble modern computers
date to the mid-20th century (around 1940 - 1945), although the computer concept and various machines similar to computers existed
earlier. Early electronic computers were the size of a large room, consuming as much power as several hundred unmodern personal
computers.
17- Text Mining: Use association rule mining techniques to find most related words in the following five text documents.
For example: “98% of the documents which are interested on apple do it related with the chromatography”.
X  Y: “apple  chromatography”
Doc1
Doc2
Doc3
Doc4
Doc5
Sea refers to water of the ocean.
Large lakes are sometimes referred to as inland seas.
A sea is a large expanse of water.
Why is sea water salty, and not lake water?
Lake is not large as sea water.
Related documents