Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Basic Data Mining Techniques Contents • Query Tools • Statistical Techniques • Visualization Techniques • Case-Based Learning (K-Nearest Neighbor) Query Tools and Statistical Techniques • 客戶是電信公司最大的資產 • 客戶行為存在於交換機的通話記錄中 • 了解客戶行為成為電信公司的趨勢 – 案例: 推銷電話線路 • 替那些目前線路已經飽和的公司提供更多 的電話線路‚ 是持續會有的商機 – 何時客戶會需要額外的連接線路? 推銷電話線路 交換機的 通話記錄 持續時間 轉成總類 對時間排序 統計佔線數 Query Tools and Statistical Techniques Naive Predictions Query Tools and Statistical Techniques Query Tools and Statistical Techniques Query Tools and Statistical Techniques Query Tools and Statistical Techniques Query Tools and Statistical Techniques Visualization Techniques (Scatter Diagram) Music Magazine Distance between Data Points K-Nearest Neighbor • Records that are close to each other live in each other’s neighborhood – Customers of the same type (cluster) will show the same behavior – Do as your neighbors do – Not really a learning technique r – Disadvantage: • Inefficiency • It is difficult to understand that the performance of k-nearest neighbor is better than naïve prediction K-Nearest Neighbor Result of the K-Nearest Neighbor Process 67.1% 70.2% 55.3% 85.4% 91.9% 電影推薦 電影推薦 K-Nearest Neighbors for 0*3*6 • C1: 1 0 0 1 0 • M1: 0 1 1 1 0 • Distance = 3 or Similarity = 4 0 0 1 1 • C1: 1 0 0 1 0 • M2: 0 1 1 1 0 • Distance = 4 or Similarity = 3 0 1 1 1 K-Nearest Neighbors for 0*3*6 M1 M2 4 3 M8 M9 3 4 M15 M16 4 6 M22 M23 2 4 M3 6 M10 4 M17 4 M24 4 M4 5 M11 3 M18 5 M25 6 M5 M6 M7 4 4 5 M12 M13 M14 5 7 6 M19 M20 M21 6 7 3 M26 4 Similarity If Similarity_Threshold is 6 Then 7 Neighbors (M3, M13, M14, M16, M19, M20, M25) are selected. Summarize these 7 Neighbors • Neighbor 1: – 111 134 388 262 261 266 268 012 260 184 238 091 104 142 038 • Neighbor 2: – 240 256 290 441 442 442 510 518 518 520 522 001 005 016 184 • Neighbor 3: – none • Neighbor 4: – 402 193 228 179 227 111 204 364 • Neighbor 5: – 280 • Neighbor 6: – 193 • Neighbor 7: – 186 189 193 214 239 179 227 263 240 Like Movies Like Movies for 0*3*6 • • • • • • • • Count = 03 Count = 02 Count = 02 Count = 02 Count = 02 Count = 02 Count = 02 Count = 02 Movie = 臥虎藏龍 Movie = 尖峰時刻 Movie = 蛇眼 Movie = 美麗人生 Movie = 厄夜叢林 Movie = 楚門的世界 Movie = 全民公敵 Movie = 神鬼傳奇 (193) (184) (240) (442) (518) (111) (179) (227) Data Mining Tool & Query Tool • Suppose a large database containing millions of records that describe customers’ purchases – – – – Who bought which product on what date? What is the average turnover in July? What is an optimal segmentation of clients? What are the most important trends in customer behavior? • If you know exactly what you are looking for, use query tool • If you know only vaguely what you are looking for, use data mining tool Data Mining Tool & Query Tool