Download Document

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Data Mining : Basic Data
Mining Techniques
2008.4.10
Database Lab 김성원
목차








3.1
3.2
3.3
3.4
3.5
3.6
3.7
3.8
Decision Tree
Generating Association Rules
The K-Means Algorithm
Genetic Learning
Choosing a Data Mining Technique
Chapter Summary
Key Term
Exercises
3.1 Decision Tree

Decision Tree 알고리즘 설계
1.
Tree에 이용할 Training Instances 생성
2.
Tree에 포함된 Instances의 Best Differentiates 를 통해 attribute 을
선택
3.
Tree node 를 생성하기 위해 attribute 을 선택.
4.
각 subclass 들은 3번째 단계에서 생성:

만약 subclass의 instance가 미리 정의된 기준을 만족하면 나머지
attribute 의 집합에 대한 선택경로가 null이 되어 새로운 instance
에 대한 분류를 경로 지정한다.

만약 subclass의 기준을 만족하지 못한다면 하나의 attribute 을 더
욱 세분화하여 현재 subclass의 instance로 선택하고 2번째 단계
로 다시 돌아간다.
3.1 Decision Tree
Table 3.1 • The Credit Card Promotion Database
Income
Range
Life Insurance
Promotion
Credit Card
Insurance
Sex
Age
40–50K
30–40K
40–50K
30–40K
50–60K
20–30K
30–40K
20–30K
30–40K
30–40K
40–50K
20–30K
50–60K
40–50K
20–30K
No
Yes
No
Yes
Yes
No
Yes
No
No
Yes
Yes
Yes
Yes
No
Yes
No
No
No
Yes
No
No
Yes
No
No
No
No
No
No
No
Yes
Male
Female
Male
Male
Female
Female
Male
Male
Male
Female
Female
Male
Female
Male
Female
45
40
42
43
38
55
35
27
43
41
43
29
39
55
19
Table 3.1 The Credit Card Promotion Database
3.1 Decision Tree
Credit
Card
Insurance
No
Yes
3 Yes
0 No
6 Yes
6 No
Income
Range
20-30K
2 Yes
2 No
30-40K
4 Yes
1 No
Figure 3.2 A partial decision tree with root node = credit card insurance
40-50K
1 Yes
3 No
50-60K
Age
2 Yes
Figure 3.1 A partial decision tree with root node = income range
<= 43
9 Yes
3 No
> 43
0 Yes
3 No
Figure 3.3 A partial decision tree with root node = age
3.1 Decision Tree
Age
Exercise : Computational Questions
Table 3.1 • The Credit Card Promotion Database
<= 43
Income
Range
Life Insurance
Promotion
Credit Card
Insurance
Sex
Age
40–50K
30–40K
40–50K
30–40K
50–60K
20–30K
30–40K
20–30K
30–40K
30–40K
40–50K
20–30K
50–60K
40–50K
20–30K
No
Yes
No
Yes
Yes
No
Yes
No
No
Yes
Yes
Yes
Yes
No
Yes
No
No
No
Yes
No
No
Yes
No
No
No
No
No
No
No
Yes
Male
Female
Male
Male
Female
Female
Male
Male
Male
Female
Female
Male
Female
Male
Female
45
40
42
43
38
55
35
27
43
41
43
29
39
55
19
Node = Age
No (3/0)
Sex
Female
Male
Yes (6/0)
Credit
Card
Insurance
C.E = 1 – max(12/15,3/15) = 0.2
No
Sex
> 43
C.E = 1- max(6/12,6/12) = 0.5
Credit Card Insurance
C.E = 1 – max(2/6,4/6) = 0.33
No (4/1)
Yes
Yes (2/0)
3.1 Decision Tree
IF Age <=43 & Sex = Male
& Credit Card Insurance = No
THEN Life Insurance Promotion = No
Table 3.2 • Training Data Instances Following the Path in Figure 3.4 to Credit Card
Insurance = No
Income
Range
Life Insurance
Promotion
Credit Card
Insurance
Sex
Age
40–50K
20–30K
30–40K
20–30K
No
No
No
Yes
No
No
No
No
Male
Male
Male
Male
42
27
43
29
Accuracy = 75%
IF Sex = Male & Credit Card Insurance = No THEN Life Insurance
Promotion = No
Accuracy = 83.3%
3.1 Decision Tree

Advantages of Decision Trees





이해하기 쉽고 분류작업이 용이하다.
실제 문제에 적용할 수 있다.
가정(선형성, 등분산성 등)이 필요 없다.
Numerical data 와 Categorical data 모두 취급 가능하다.
Disadvantages of Decision Trees



Output attribute가 분류되어야 한다.
Decision tree algorithms은 Tree가 깊어질수록 예측력 저하와
해석의 어려움 등 불안정해진다.
계산량이 많을 수 있다.
3.2 Generating Association Rules
Confidence and Support

Milk -> Bread
Support(milk,bread)
= Pattern(milk,bread)/전체 트랜잭션 수
= 5000/10000 = 50%
 Confidence(milk,bread)
= Pattern(milk,bread)/Pattern(milk)
= 5000/8000 = 62.5%

3.2 Generating Association Rules
Mining Association Rules : An Example
Apriori algorithm
1.
Item set을 생성한다.
2.
생성된 Item set을 이용하여 association rule을 만든
다.
3.2 Generating Association Rules
Mining Association Rules : An Example
Table 3.3 • A Subset of the Credit Card Promotion Database
Magazine
Promotion
Yes
Yes
No
Yes
Yes
No
Yes
No
Yes
Yes
Watch
Promotion
Life Insurance
Promotion
Credit Card
Insurance
Sex
No
Yes
No
Yes
No
No
No
Yes
No
Yes
No
Yes
No
Yes
Yes
No
Yes
No
No
Yes
No
No
No
Yes
No
No
Yes
No
No
No
Male
Female
Male
Male
Female
Female
Male
Male
Male
Female
3.2 Generating Association Rules
Table 3.4 • Single-Item Sets
Single-Item Sets
Number of Items
Magazine Promot ion = Yes
Wat ch Promot ion = Yes
Wat ch Promot ion = No
Lif e Insurance Promot ion = Yes
Lif e Insurance Promot ion = No
Credit Card Insurance = No
Sex = Male
Sex = Female
7
4
6
5
5
8
6
4
* Three Item set
Table 3.5
• Two-Item
Sets
Watch
Promotion
= No
& Life Insurance Promotion = Number
No of Items
Two-Item Sets
& Credit Card Insurance
Magazine Promotion = Yes & Watch Promotion = No
4
Magazine Promotion = Yes & Life Insurance Promotion = Yes
Magazine Promotion = Yes & Credit Card Insurance = No
Magazine Promotion = Yes & Sex = Male
Watch Promotion = No & Life Insurance Promotion = No
Watch Promotion = No & Credit Card Insurance = No
Watch Promotion = No & Sex = Male
Life Insurance Promotion = No & Credit Card Insurance = No
Life Insurance Promotion = No & Sex = Male
Credit Card Insurance = No & Sex = Male
Credit Card Insurance = No & Sex = Female
5
5
4
4
5
4
5
4
4
4
3.2 Generating Association Rules
Mining Association Rules : An Example
Two-Item set rules
IF Magazine Promotion =Yes
THEN Life Insurance Promotion =Yes (5/7)
IF Life Insurance Promotion =Yes
THEN Magazine Promotion =Yes (5/5)
Three-item set rules
IF Watch Promotion =No & Life Insurance
THEN Credit Card Insurance =No (4/4)
IF Watch Promotion =No
THEN Life Insurance Promotion = No & Credit
Promotion = No
Card Insurance =
No (4/6)
IF Credit Card Insurance = No
THEN Watch Promotion = No & Life Insurance Promotion = No(4/8)
3.2 Generating Association Rules
General Considerations
•
•
Association rules 를 사용하게 되었을 때 제품
을 고객이 살 때 연관된 제품을 통해 한 개 또는
더 많은 다른 제품들도 팔 수 있는 흥미로운 결
과를 볼 수 있다.
연관규칙의 특정한 연관이 기대된 confidence
보다 더 낮은 값을 보이는 흥미로운 점도 있다.
Related documents