Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Extension Msc Students Assignment : Individual Due date: Friday, march 4 2022 Question 1 a) The following table consisits of training data. Construct the decision tree that would be generated by the ID3 algorithm, using entropy-based information gain. Classify the records by the “Status“ attribute. Write down the rules that can be generated from obtained decision tree. Show computation steps clearly! Table 1: Data set for question 1 Department Age Salary Status Sales Middle_aged High Senior Sales Young Low Junior Sales Middle_aged Low Junior System Young High Junior System Middle_aged High Senior System Young High Junior System Senior High Senior Marketing Middle_aged High Senior Marketing Middle_aged Average Junior Secretary Senior Average Senior Secretary Young Low Senior Question 2 After runing apriori algorithm using weka with some database: student data base who attended the given courses: Compiler, emedding and so on.. The out put is shown below : === Run information === Scheme: weka.associations.Apriori -N 20 -T 0 -C 0.9 -D 0.05 -U 1.0 -M 0.5 -S -1.0 -c -1 Relation: test_student Instances: 15 Attributes: 10 Advanced_Database Compiler Emedding Logic Algorithm Database System_Design Graphics Networking Algorithm_DataStructure === Associator model (full training set) === Apriori ======= Minimum support: 0.55 (8 instances) Extension Msc Students Minimum metric <confidence>: 0.9 Number of cycles performed: 9 Generated sets of large itemsets: Size of set of large itemsets L(1): 10 Size of set of large itemsets L(2): 14 Size of set of large itemsets L(3): 5 Best rules found: 1. Networking=FALSE 13 ==> Graphics=TRUE 13 <conf:(1)> lift:(1) lev:(0) [0] conv:(0) 2. Algorithm=TRUE 11 ==> Graphics=TRUE 11 <conf:(1)> lift:(1) lev:(0) [0] conv:(0) 3. Compiler=TRUE 10 ==> Graphics=TRUE 10 <conf:(1)> lift:(1) lev:(0) [0] conv:(0) 4. Compiler=TRUE 10 ==> Networking=FALSE 10 <conf:(1)> lift:(1.15) lev:(0.09) [1] conv:(1.33) 5. Emedding=TRUE 10 ==> Graphics=TRUE 10 <conf:(1)> lift:(1) lev:(0) [0] conv:(0) 6. Compiler=TRUE Networking=FALSE 10 ==> Graphics=TRUE 10 <conf:(1)> lift:(1) lev:(0) [0] conv:(0) 7. Compiler=TRUE Graphics=TRUE 10 ==> Networking=FALSE 10 <conf:(1)> lift:(1.15) lev:(0.09) [1] conv:(1.33) 8. Compiler=TRUE 10 ==> Graphics=TRUE Networking=FALSE 10 <conf:(1)> lift:(1.15) lev:(0.09) [1] conv:(1.33) 9. Advanced_Database=TRUE 9 ==> Graphics=TRUE 9 <conf:(1)> lift:(1) lev:(0) [0] conv:(0) 10. System_Design=FALSE 9 ==> Graphics=TRUE 9 <conf:(1)> lift:(1) lev:(0) [0] conv:(0) 11. Emedding=TRUE Networking=FALSE 9 ==> Graphics=TRUE 9 <conf:(1)> lift:(1) lev:(0) [0] conv:(0) 12. Algorithm=TRUE Networking=FALSE 9 ==> Graphics=TRUE 9 <conf:(1)> lift:(1) lev:(0) [0] conv:(0) 13. Logic=TRUE 8 ==> Graphics=TRUE 8 <conf:(1)> lift:(1) lev:(0) [0] conv:(0) 14. Database=TRUE 8 ==> Graphics=TRUE 8 <conf:(1)> lift:(1) lev:(0) [0] conv:(0) 15. Algorithm_DataStructure=TRUE 8 ==> Graphics=TRUE 8 <conf:(1)> lift:(1) lev:(0) [0] conv:(0) 16. Advanced_Database=TRUE Networking=FALSE 8 ==> Graphics=TRUE 8 <conf:(1)> lift:(1) lev:(0) [0] conv:(0) 17. System_Design=FALSE Networking=FALSE 8 ==> Graphics=TRUE 8 <conf:(1)> lift:(1) lev:(0) [0] conv:(0) 18. Emedding=TRUE 10 ==> Networking=FALSE 9 <conf:(0.9)> lift:(1.04) lev:(0.02) [0] conv:(0.67) 19. Emedding=TRUE Graphics=TRUE 10 ==> Networking=FALSE 9 <conf:(0.9)> lift:(1.04) lev:(0.02) [0] conv:(0.67) 20. Emedding=TRUE 10 ==> Graphics=TRUE Networking=FALSE 9 <conf:(0.9)> lift:(1.04) lev:(0.02) [0] conv:(0.67) a) Interperate the above rules found by the algorithm. Question 3: Apply/run Apriori on real-world dataset: supermarket.arff file. Load data at Preprocess tab. Click the Open file button to bring up a standard dialog through which you can select a file. Choose the supermarket.arff file. a) Apply KDD process and give brief statement about each process using the given data set b) Experiment with Apriori and investigate the effect of the various parameters described-see figure below: Extension Msc Students (for details see weka document or weka tutorial : how to save the results….. Prepare a brief written report on the main findings of your investigation(show results) Question 4 Consider the following table 2: Exanple of market basket transactions CID 1 1 2 2 3 3 4 4 5 5 Tid 100 200 300 400 500 600 700 800 900 1000 Items_bought {1,2,3,4} {1,2,3,4,5} {2,3,4} {2,3,5} {1,2,4} {1,3,4} {2,3,4,5} {1,3,4,5} {3,4,5} {1,2,3,5} a) Trace the results of using the Apriori algorithm(manually) on the transactions in table 2 with minsup S = 4 and 5 and Confidence C=60%. (i) Show the candidate and frequent itemsets for each database scan. (ii) What is candidates number during scan two? How many of them are frequent items during scan two? Extension Msc Students (iii) (iv) (v) List all association rules that are generated and light the strongest one (with support s and confidence c ) , and sort by confidence Give comments about the results What do suggest for client(maket manager assuming each items (1-5) represent real world items(orange, mango, and so on ) about the results and give comments about customer (CID) Question 5 Table 3: Data for height classification for Name Gender Height Class 1 Kristina F 1.6m Short Jim M 2m Tall Martha F 1.9m Medium Alia F 1.88m Medium Kebedech F 1.7m Short Mussa M 1.85m Medium Almaz F 1.6m Short Khan M 1.7m Short Kim M 2.2m Tall Aziz M 2.1m Tall Aynalem F 1.8m Medium Zaki M 1.95m Medium Kati F 1.9m Medium Xem F 1.8m Medium Yem F 1.75m Medium c and d Class2 Height ranges in m Medium 0 − 1.6 Medium 1.9 − 2.0 Tall 1.8 − 1.9 Tall 1.8 − 1.9 Medium 1.6 − 1.7 Medium 1.8 − 1.9 Medium 0 − 1.6 Medium 1.6 − 1.7 Tall 2.0 − ∞ Tall 2.0 − ∞ Medium 1.7 − 1.8 Medium 1.9 − 2.0 Tall 1.8 − 1.9 Medium 1.7 − 1.8 Medium 1.7 − 1.8 a) What do you mean Naïve Bayes classification? Explain it using the above table b) Given the training data in table 3 (height classification), use “Class 1” as attribute to predict the class of the following new test sample ( Yemer , M, 1 .95m ) using Naïve Bayes classification c) Repeate using “ Class 2” as attribue to predict the class of the following sample test: (Nati, F, 1.89m) using Naïve Bayes classification Question 6 Say True or false : Justify your answer Data mining was ONLY possible (or rather made economically viable) by the advent of computers