Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Example Questions for Final Exam All questions in all Course Studies (Course Study 1 – Course Study 7) All questions in “Example Questions for Midterm Exam” document. Part 1: Data Preparation before Data Mining Part 2: Association Rule Mining Part 3: Sequential Pattern Mining Part 4: Classification Techniques in Data Mining (Bayes, Decision Tree, K-Nearest Neighbor) Part 4: Classification Techniques in Data Mining (Neural Network) 1- Write a program code in any programming language to calculate the output of the neuron for classification. I j wij Oi j Oj i 1 1 e I j w1 x1 x2 y w2 bias For example: Input vector X=<1, 0, 1> weight1 = 0.2 weight2 = 0.4 weight3 = -0.5 wƟ=0.6 Ɵ = -0.4 0.6 -0.4 2- Train a perceptron only one step using the following activation function and parameter values. T=Desired output O=Actual output T=0, O=1, W1=0.5, W2=0.3, Input1=2, Input2=1, W =-1, Ɵ = 1 Activation Function 1 : w i I i 0 O i 0 : otherwise w i (t 1) w i (t ) w i (t ) w i (t ) (T O)I i 3- Train a perceptron (until no error) to recognize logical AND using Step Function with threshold 0.2 and learning rate 0.1. error e = Yexpected – Yactual α is the learning rate xi is an input wi = wi + Δwi Δwi = α * xi * e 4- In order to train neural network for the following dataset, how many input(s), weight(s) and output(s) are needed? Department Sales Systems Systems Marketing Age 30's 20's 40+ 40+ Status Junior Junior Senior Senior Sales Count 5 30 56 95 5- Solve the XOR problem using Unit Step activation function and following initial parameters. Part 5: Clustering 6- Given three binary bit strings 010111, 101001, 111100, compute the dissimilarity matrix (distance matrix) for these three points using the Jaccard distance metric. 7- Car-A and Car-B can be grouped together in the same cluster or not? Why? Aspiration (Std ?) Yes Yes Car A Car B Fuel-type (Diesel ?) No Yes Num-of-doors (4 ?) No No Engine-location (front ?) No Yes Drive-wheels (fwd ?) Yes No Body-style (sedan ?) Yes No aspiration: std, turbo. fuel-type: diesel, gas. num-of-doors: four, two. engine-location: front, rear. drive-wheels: fwd, rwd. body-style: sedan, hatchback 8- Given the 5 pts below show all steps for a 2-means clustering algorithm (k=2) where the initial mean estimates are (1.67, 0.67) & (3.25, 1) <2.1, 3.4>, <1.8, 3.4>, <0.7, 2.0>, <3.4, 2.0>, <3.6, 2.0> Run the algorithm until the clusters do not change. 9- For the following data, find the cluster of sample <1,1>. (Use Euclidean distance) Sample 1 2 3 X 1 2 1 Y 3 0 1 Cluster 1 2 ? 10- Circle the clusters and draw dentograms for the following data points. Merge clusters based on three criteria: single link, complete link, and average link. 11-Using k-means with k=2, calculate the first clusters for the data below. Table gives distances between pairs of points. Circle the two clusters. Assume the initial cluster points are: (31,32) and (34,24). 11,6 11,6 0 11,38 12.6 15,18 35.2 20,40 22.8 25,24 15.1 26,8 32.8 31,32 32.8 34,24 29.2 40,41 45.4 43.47 52.0 11,38 15,18 20.40 24,24 26,8 31,32 34,24 40,41 43,47 32.0 12.6 35.2 22.8 15.1 32.8 29.2 45.4 52.0 0 20.3 9.2 19.8 33.5 20.9 26.9 29.1 33.2 20.4 0 22.6 11.7 14.9 21.2 20.0 34.0 40.3 9.2 22.5 0 16.8 32.6 13.6 21.3 20.0 24.0 19.8 11.7 16.8 0 16.3 10 9 22.7 29.2 33.5 14.9 32.6 16.0 0 24.5 17.9 35.8 42.5 20.9 21.2 13.6 10.0 24.5 0 8.5 12.7 19.2 26.9 19.9 21.3 9.0 17.9 8.5 0 18.0 24.7 29.2 34.0 20.0 22.7 35.8 12.7 18.0 0 6.7 33.2 40.3 24.0 29.2 42.5 19.2 24.7 6.7 0 Part 6: Outlier Detection 12- Network intrusion detection by using outlier detection methods is one of the well known data mining applications. Give an example dataset which have the following columns: Start Time Src IP Src Port Dst IP Dst Port P F Service Bytes Attack or Normal 00:00:11.380 164.107.1.2 1026 205.188.2 54.195 4000 17 0 domain 56 ATTACK 00:00:11.384 216.65.138.227 1055 164.10 7.1.3 28001 17 0 snmptrap 36 NORMAL 00:00:11.384 164.107.1.3 28001 216.65.1 38.227 1055 17 0 N/A 68 NORMAL .... .... .... ..... .... ... ... .. ... .. Then find the association rule(s) for detecting attacks with a minimum support value such that Src IP=2006.163.27.95, Dest Port=139, Bytes Є [150, 200] ATTACK (support %55) Show the extraction of the association rule(s) step by step. Part 7: Web Mining 13- Web Usage Mining - Write an example dataset and then extract some association rules related with web usage patterns. For example: o 60% of users who placed an online order in /company/product2, were in the 20-25 age group and lived on the West Coast 14 - We have a Web Log File in ECLF Format. Write a program code in any programming language for the following web mining operation: “Which web page was mostly called from which web page”. For Example: IP Address Time/Date Method/URI Referrer Agent 202.120.224.4 15:30:01/2-Jan-01 GET Index.htm http://ok.edu/link.htm Mozilla/4.0(IE5.0W98) 202.120.224.4 15:30:01/2-Jan-01 GET 1.htm http://ex.edu/index.htm Mozilla/4.0(IE5.0W98) 202.120.224.4 15:30:01/2-Jan-01 GET A.htm http://ex.edu/index.htm Mozilla/4.0(IE5.0W98) 202.120.224.4 15:33:04/2-Jan-01 GET Index.htm http://ok.edu/res.php Mozilla/4.0(IE4.0NT) 202.120.224.4 15:33:04/2-Jan-01 GET 1.htm http://ex.edu/index.htm Mozilla/4.0(IE4.0NT) 202.120.224.4 15:33:04/2-Jan-01 GET A.htm http://ox.edu/index.htm Mozilla/4.0(IE4.0NT) 202.120.224.4 15:35:11/2-Jan-01 GET 1.htm http://ok.edu/A.htm Mozilla/4.0(IE5.0W98) 202.120.224.4 15:35:11/2-Jan-01 GET B.htm http://ex.edu/A.htm Mozilla/4.0(IE4.0NT) 202.120.224.4 15:37:09/2-Jan-01 GET A.htm http://ox.edu/C.htm Mozilla/4.0(IE5.0W98) 15- Write an example for each web mining categories. Part 8: Text Mining 16- Write the output text after we apply feature generation step of Text Mining Process. A computer is a machine that manipulates data according to a list of instructions. The first devices that resemble modern computers date to the mid-20th century (around 1940 - 1945), although the computer concept and various machines similar to computers existed earlier. Early electronic computers were the size of a large room, consuming as much power as several hundred unmodern personal computers. 17- Text Mining: Use association rule mining techniques to find most related words in the following five text documents. For example: “98% of the documents which are interested on apple do it related with the chromatography”. X Y: “apple chromatography” Doc1 Doc2 Doc3 Doc4 Doc5 Sea refers to water of the ocean. Large lakes are sometimes referred to as inland seas. A sea is a large expanse of water. Why is sea water salty, and not lake water? Lake is not large as sea water.