Download Basic Data Mining Techniques

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Basic Data Mining Techniques
Contents
• Query Tools
• Statistical Techniques
• Visualization Techniques
• Case-Based Learning (K-Nearest Neighbor)
Query Tools and
Statistical Techniques
• 客戶是電信公司最大的資產
• 客戶行為存在於交換機的通話記錄中
• 了解客戶行為成為電信公司的趨勢
– 案例: 推銷電話線路
• 替那些目前線路已經飽和的公司提供更多
的電話線路‚ 是持續會有的商機
– 何時客戶會需要額外的連接線路?
推銷電話線路
交換機的
通話記錄
持續時間
轉成總類
對時間排序
統計佔線數
Query Tools and
Statistical Techniques
Naive Predictions
Query Tools and
Statistical Techniques
Query Tools and
Statistical Techniques
Query Tools and
Statistical Techniques
Query Tools and
Statistical Techniques
Query Tools and
Statistical Techniques
Visualization Techniques
(Scatter Diagram)
Music Magazine
Distance between Data Points
K-Nearest Neighbor
• Records that are close to each other live in
each other’s neighborhood
– Customers of the same type (cluster) will show
the same behavior
– Do as your neighbors do
– Not really a learning technique
r
– Disadvantage:
• Inefficiency
• It is difficult to understand that the performance of
k-nearest neighbor is better than naïve prediction
K-Nearest Neighbor
Result of the K-Nearest
Neighbor Process
67.1%
70.2%
55.3%
85.4%
91.9%
電影推薦
電影推薦
K-Nearest Neighbors for 0*3*6
• C1:
1
0
0
1
0
• M1:
0
1
1
1
0
• Distance = 3 or Similarity = 4
0
0
1
1
• C1:
1
0
0
1
0
• M2:
0
1
1
1
0
• Distance = 4 or Similarity = 3
0
1
1
1
K-Nearest Neighbors for 0*3*6
M1
M2
4
3
M8
M9
3
4
M15
M16
4
6
M22
M23
2
4
M3
6
M10
4
M17
4
M24
4
M4
5
M11
3
M18
5
M25
6
M5
M6
M7
4
4
5
M12
M13
M14
5
7
6
M19
M20
M21
6
7
3
M26
4
Similarity
If Similarity_Threshold is 6
Then 7 Neighbors (M3, M13, M14, M16,
M19, M20, M25) are selected.
Summarize these 7 Neighbors
• Neighbor 1:
– 111 134 388 262 261 266 268 012 260 184 238 091 104 142 038
• Neighbor 2:
– 240 256 290 441 442 442 510 518 518 520 522 001 005 016 184
• Neighbor 3:
– none
• Neighbor 4:
– 402 193 228 179 227 111 204 364
• Neighbor 5:
– 280
• Neighbor 6:
– 193
• Neighbor 7:
– 186 189 193 214 239 179 227 263 240
Like Movies
Like Movies for 0*3*6
•
•
•
•
•
•
•
•
Count = 03
Count = 02
Count = 02
Count = 02
Count = 02
Count = 02
Count = 02
Count = 02
Movie = 臥虎藏龍
Movie = 尖峰時刻
Movie = 蛇眼
Movie = 美麗人生
Movie = 厄夜叢林
Movie = 楚門的世界
Movie = 全民公敵
Movie = 神鬼傳奇
(193)
(184)
(240)
(442)
(518)
(111)
(179)
(227)
Data Mining Tool & Query Tool
• Suppose a large database containing millions of
records that describe customers’ purchases
–
–
–
–
Who bought which product on what date?
What is the average turnover in July?
What is an optimal segmentation of clients?
What are the most important trends in customer
behavior?
• If you know exactly what you are looking for, use
query tool
• If you know only vaguely what you are looking
for, use data mining tool
Data Mining Tool & Query Tool
Related documents