Download Question77351 - Just Question Answer

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
A. Association Rule Mining: (15 points)
Given a transaction database for mining association rule as follows:
Database D
TID
100
200
300
400
Items
ACD
BCE
ABCE
BE
Please use Apriori algorithm to mine association rules with minimum support count = 2.
(Please show the derivation process step by step with candidate itemsets.)
B. Generating Classification Rules from a Decision Tree (50 points)
Given a database table containing weather data as follows:
Outlook
Sunny
Sunny
Overcast
Rainy
Rainy
Rainy
Overcast
Sunny
Sunny
Rainy
Sunny
Overcast
Overcast
Rainy
Temperature
Hot
Hot
Hot
Mild
Cool
Cool
Cool
Mild
Cool
Mild
Mild
Mild
Hot
Mild
Humidity
High
High
High
High
Normal
Normal
Normal
High
Normal
Normal
Normal
High
Normal
High
Windy
False
True
False
False
False
True
True
False
False
False
True
True
False
True
Class: Play
No
No
Yes
Yes
Yes
No
Yes
No
Yes
Yes
Yes
Yes
Yes
No
1. Please use the basic algorithm (version of ID3) for inducing a decision tree from the given
training samples in the weather database table.
2. Please extract the classification rules from the generated decision tree in 1.
C. For the weather database table given in B, please predict a class label for the weather data by
using naïve Bayesian classification approach (20 points).
The unknown samples to be classified are:
(Outlook = ‘Sunny’, Temperature = ‘Mild’ , Humidity = ‘High’ , Windy = ‘False’)
(Outlook = ‘Sunny’, Temperature = ‘Hot’ , Humidity = ‘High’ , Windy = ‘False’)
1
D. Classification and Characteristic Rule Derivation (50 points)
For the given conditions as follows:
1.) The relation table:
Student
Name
Carrie
Stilwell
Nana
O’Hare
Peabody
Juliana
O’Neil
Diana
Anderson
Christopher
Cook
Kim Ming
Donovan
George
Donna
Mike
Lynn
Lisa
Sherry
Robert
Sex
female
male
female
male
male
female
male
female
male
male
female
female
male
male
female
male
female
female
female
male
Age
33
58
35
35
50
68
42
51
45
42
40
38
32
42
35
60
55
37
46
51
Birth_place
Florida
Michigan
Japan
Canada
New York
California
France
Oregon
India
New Mexico
Illinois
South Korea
Netherlands
South Korea
Texas
Ohio
Georgia
Italy
Germany
Kansas
Major
DCTE
DISS
DCTE
DCS
DISS
DISS
DCS
DCTE
DCS
DISS
DISS
DCTE
DCS
DCS
DCS
DISS
DISS
DCS
DCS
DISS
Position
Instructor
Manager
Instructor
Assistant Prof.
CIO
CEO
Lecturer
Assist. Prof.
Instructor
Manager
System Analyst
Lecturer
Assist. Prof.
Instructor
Programmer
Manager
Manager
Programmer
Lecturer
CIO
Salary
$31,000
$65,000
$35,000
$45,000
$70,000
$90,000
$50,000
$49,000
$49,500
$55,000
$45,000
$35,500
$48,000
$35,000
$57,000
$67,500
$60,000
$47,500
$38,000
$72,500
2.) The concept hierarchy table:
Age:
{21 - 30 }  Young
{31 - 50 }  Mid-Age
{51 - 70 }  Old
Birth_Place:
{Canada, France, Germany , India, Italy, Japan, Netherlands, South Korea } Foreign
{California, Florida, Georgia, Illinois, Kansas, Michigan, New York, Ohio, Oregon, Utah, Texas,
New Mexico }  USA
Major:
{ DCS, DISS, DCTE }
Position:
2
{Instructor, Lecturer, Assistant Professor, Associate Professor, Professor}  Faculty
{CEO, CIO, Manager, Programmer, System Analyst}  Non-Faculty
Salary:
{ $20,000 - $30,000 } Low
{ $30,001 - $50,000 } Medium
{ $50,001 - $100,000 } High
Please do the followings:
1.
2.
3.
4.
5.
6.
Please derive the quantitative classification and characteristic rules from the given relation table
and the concept hierarchy table. The target class for the quantitative rule derivation is male.
Please indicate the sufficient and necessary conditions for these derived rules.
Please indicate which attribute should be removed during the derivation process.
Please indicate the threshold value T for the number of distinct values of each remaining
attribute.
Please give the detailed intermediate tables in the derivation process, mark the tuples which
overlap between the target class and contrast class, and make assumptions whenever necessary.
Please make reference to the following materials for the examination:
Jiawei Han, Yandong Cai, and Nick Cercone, “Data-Driven Discovery of Quantitative Rules in
Relational Databases,” IEEE Transactions on Knowledge and Data Engineering, Vol. 5, No. 1,
1993, pp. 29-40.
Yandong Cai, Nick Cercone, and Jiawei Han, “An Attribute-Oriented Approach for Learning
Classification Rules from Relational Databases,” in Proceedings of Sixth International Conference
on Data Engineering, February 1990, pp. 281-288.
E. Please answer the following questions: (35 points)
(a) What is the confidence for the rules ∅ A and A  ∅? (10 points)
(b) Let c1, c2, and c3 be the confidence values of the rules {p}{q}, {p}{q, r}, and {p, r}{q},
respectively. If we assume that c1, c2, and c3 have different values, what are the possible
relationships that may exist among c1, c2, and c3? Which rule has the lowest confidence?
(15 points)
3
(c) Repeat the analysis in part (b) assuming that the rules have identical support. Which rule has
the highest confidence? (10 points)
F. Please give the answers to the following questions:
1. What is the lower bound and upper bound in terms of the number of candidate sets generated
by the Apriori association rule mining algorithm? (15 points)
2. What is the total number of possible association rules, which can be generated from a given data
set that contains n items? Why? (25 points)
4
Related documents