Download DM INDIVIDUAL ASSIGNMENT

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Extension Msc Students
Assignment : Individual
Due date: Friday, march 4 2022
Question 1
a) The following table consisits of training data. Construct the decision tree that
would be generated by the ID3 algorithm, using entropy-based information
gain. Classify the records by the “Status“ attribute. Write down the rules that can
be generated from obtained decision tree. Show computation steps clearly!
Table 1: Data set for question 1
Department Age
Salary
Status
Sales
Middle_aged High
Senior
Sales
Young
Low
Junior
Sales
Middle_aged Low
Junior
System
Young
High
Junior
System
Middle_aged High
Senior
System
Young
High
Junior
System
Senior
High
Senior
Marketing
Middle_aged High
Senior
Marketing
Middle_aged Average Junior
Secretary
Senior
Average Senior
Secretary
Young
Low
Senior
Question 2
After runing apriori algorithm using weka with some database: student data base
who attended the given courses: Compiler, emedding and so on..
The out put is shown below :
=== Run information ===
Scheme:
weka.associations.Apriori -N 20 -T 0 -C 0.9 -D 0.05 -U 1.0 -M 0.5 -S -1.0 -c -1
Relation:
test_student
Instances: 15
Attributes: 10
Advanced_Database
Compiler
Emedding
Logic
Algorithm
Database
System_Design
Graphics
Networking
Algorithm_DataStructure
=== Associator model (full training set) ===
Apriori
=======
Minimum support: 0.55 (8 instances)
Extension Msc Students
Minimum metric <confidence>: 0.9
Number of cycles performed: 9
Generated sets of large itemsets:
Size of set of large itemsets L(1): 10
Size of set of large itemsets L(2): 14
Size of set of large itemsets L(3): 5
Best rules found:
1. Networking=FALSE 13 ==> Graphics=TRUE 13 <conf:(1)> lift:(1) lev:(0) [0] conv:(0)
2. Algorithm=TRUE 11 ==> Graphics=TRUE 11 <conf:(1)> lift:(1) lev:(0) [0] conv:(0)
3. Compiler=TRUE 10 ==> Graphics=TRUE 10 <conf:(1)> lift:(1) lev:(0) [0] conv:(0)
4. Compiler=TRUE 10 ==> Networking=FALSE 10 <conf:(1)> lift:(1.15) lev:(0.09) [1] conv:(1.33)
5. Emedding=TRUE 10 ==> Graphics=TRUE 10 <conf:(1)> lift:(1) lev:(0) [0] conv:(0)
6. Compiler=TRUE Networking=FALSE 10 ==> Graphics=TRUE 10
<conf:(1)> lift:(1) lev:(0) [0]
conv:(0)
7. Compiler=TRUE Graphics=TRUE 10 ==> Networking=FALSE 10 <conf:(1)> lift:(1.15) lev:(0.09)
[1] conv:(1.33)
8. Compiler=TRUE 10 ==> Graphics=TRUE Networking=FALSE 10 <conf:(1)> lift:(1.15) lev:(0.09)
[1] conv:(1.33)
9. Advanced_Database=TRUE 9 ==> Graphics=TRUE 9 <conf:(1)> lift:(1) lev:(0) [0] conv:(0)
10. System_Design=FALSE 9 ==> Graphics=TRUE 9 <conf:(1)> lift:(1) lev:(0) [0] conv:(0)
11. Emedding=TRUE Networking=FALSE 9 ==> Graphics=TRUE 9
<conf:(1)> lift:(1) lev:(0) [0]
conv:(0)
12. Algorithm=TRUE Networking=FALSE 9 ==> Graphics=TRUE 9
<conf:(1)> lift:(1) lev:(0) [0]
conv:(0)
13. Logic=TRUE 8 ==> Graphics=TRUE 8 <conf:(1)> lift:(1) lev:(0) [0] conv:(0)
14. Database=TRUE 8 ==> Graphics=TRUE 8 <conf:(1)> lift:(1) lev:(0) [0] conv:(0)
15. Algorithm_DataStructure=TRUE 8 ==> Graphics=TRUE 8 <conf:(1)> lift:(1) lev:(0) [0] conv:(0)
16. Advanced_Database=TRUE Networking=FALSE 8 ==> Graphics=TRUE 8
<conf:(1)> lift:(1)
lev:(0) [0] conv:(0)
17. System_Design=FALSE Networking=FALSE 8 ==> Graphics=TRUE 8
<conf:(1)> lift:(1) lev:(0)
[0] conv:(0)
18. Emedding=TRUE 10 ==> Networking=FALSE 9 <conf:(0.9)> lift:(1.04) lev:(0.02) [0] conv:(0.67)
19. Emedding=TRUE Graphics=TRUE 10 ==> Networking=FALSE 9 <conf:(0.9)> lift:(1.04) lev:(0.02)
[0] conv:(0.67)
20. Emedding=TRUE 10 ==> Graphics=TRUE Networking=FALSE 9 <conf:(0.9)> lift:(1.04) lev:(0.02)
[0] conv:(0.67)
a) Interperate the above rules found by the algorithm.
Question 3:
Apply/run Apriori on real-world dataset: supermarket.arff file.
Load data at Preprocess tab. Click the Open file button to bring up a standard dialog
through which you can select a file. Choose the supermarket.arff file.
a) Apply KDD process and give brief statement about each process using the given
data set
b) Experiment with Apriori and investigate the effect of the various parameters
described-see figure below:
Extension Msc Students
(for details see weka document or weka tutorial : how to save the results…..
Prepare a brief written report on the main findings of your investigation(show results)
Question 4
Consider the following table 2: Exanple of market basket transactions
CID
1
1
2
2
3
3
4
4
5
5
Tid
100
200
300
400
500
600
700
800
900
1000
Items_bought
{1,2,3,4}
{1,2,3,4,5}
{2,3,4}
{2,3,5}
{1,2,4}
{1,3,4}
{2,3,4,5}
{1,3,4,5}
{3,4,5}
{1,2,3,5}
a) Trace the results of using the Apriori algorithm(manually) on the transactions in
table 2 with minsup S = 4 and 5 and Confidence C=60%.
(i)
Show the candidate and frequent itemsets for each database scan.
(ii)
What is candidates number during scan two? How many of them are frequent
items during scan two?
Extension Msc Students
(iii)
(iv)
(v)
List all association rules that are generated and light the strongest one (with
support s and confidence c ) , and sort by confidence
Give comments about the results
What do suggest for client(maket manager assuming each items (1-5)
represent real world items(orange, mango, and so on ) about the results and
give comments about customer (CID)
Question 5
Table 3: Data for height classification for
Name
Gender Height Class 1
Kristina
F
1.6m
Short
Jim
M
2m
Tall
Martha
F
1.9m
Medium
Alia
F
1.88m Medium
Kebedech F
1.7m
Short
Mussa
M
1.85m Medium
Almaz
F
1.6m
Short
Khan
M
1.7m
Short
Kim
M
2.2m
Tall
Aziz
M
2.1m
Tall
Aynalem F
1.8m
Medium
Zaki
M
1.95m Medium
Kati
F
1.9m
Medium
Xem
F
1.8m
Medium
Yem
F
1.75m Medium
c and d
Class2
Height ranges in m
Medium
0 − 1.6
Medium
1.9 − 2.0
Tall
1.8 − 1.9
Tall
1.8 − 1.9
Medium
1.6 − 1.7
Medium
1.8 − 1.9
Medium
0 − 1.6
Medium
1.6 − 1.7
Tall
2.0 − ∞
Tall
2.0 − ∞
Medium
1.7 − 1.8
Medium
1.9 − 2.0
Tall
1.8 − 1.9
Medium
1.7 − 1.8
Medium
1.7 − 1.8
a) What do you mean Naïve Bayes classification? Explain it using the above table
b) Given the training data in table 3 (height classification), use “Class 1” as attribute
to predict the class of the following new test sample ( Yemer , M, 1 .95m ) using
Naïve Bayes classification
c) Repeate using “ Class 2” as attribue to predict the class of the following sample
test: (Nati, F, 1.89m) using Naïve Bayes classification
Question 6
Say True or false : Justify your answer
Data mining was ONLY possible (or rather made economically viable) by the
advent of computers