Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
What is Cluster Analysis? (1/4) • Cluster: a collection of data objects (物以類聚) – Similar to one another within the same cluster – Dissimilar to the objects in other clusters • Cluster analysis – Grouping a set of data objects into clusters – 將一異質的群體(a diverse group)區隔為同質性較高的群 集(clusters叢聚)或是子群(subgroups) • Clustering is unsupervised classification: no predefined classes – 資料依照本身的自我相似性(self-similarity)而群集在一起, 群集(clusters)的意義要靠事後的闡釋才能得知。 2017/5/5 Data Mining 1 What is Cluster Analysis? (2/4) 找出隱藏的現象或內部結構 2017/5/5 Data Mining 2 What is Cluster Analysis? (3/4) Typical applications As a stand-alone tool to get insight into data distribution As a preprocessing step for other algorithms − clustering might be the first step in a market segmentation effort a one-size-fits-all rule for “what kind of promotion do customers respond to best” (x) what kind of promotion works best for each cluster (with similar buying habit) (o) 2017/5/5 Data Mining 3 What is Cluster Analysis? (4/4) 線上購物網站的使用者族群與消費能力 – 具有類似基本資料的人,通常也有相近的行為模式 會員 年齡 平均月收入 (千) 20 20 2 21 26 3 22 25 4 41 30 5 43 32 6 52 40 7 55 38 2017/5/5 50 平均月收入(千) 1 年齡與平均月收入散佈圖 40 C3 30 C2 20 C1 10 0 0 10 Data Mining 20 30 年齡 40 50 60 4 What Is Good Clustering? (1/2) A good clustering method will produce high quality clusters with – high intra-class similarity and low inter-class similarity The quality of a clustering result depends on both the similarity measure used by the method and its implementation. The quality of a clustering method is also measured by its ability to discover some or all of the hidden patterns. − 在十數個刷卡行為的群集中,出現一個群集含有高比例 的呆帳案例,而其他群集毫無特色可言 2017/5/5 Data Mining 5 What Is Good Clustering? (2/2) 2017/5/5 Data Mining 6 Cluster Analysis的議題 根據甚麼資訊(特徵,屬性)來分群 事先決定cluster的數目是一件困難的工作 data屬於那個cluster應該是程度的問題(fuzzy) 而非是或否的問題(crisp) 非監督式學習沒有所謂最佳的模型 視覺化工具 vs 分群演算法 (專家經驗) 2017/5/5 Data Mining 7 A scatter graph helps to understand and visualize clusters of customers (1/2) 2017/5/5 Data Mining 8 A scatter graph helps to understand and visualize clusters of customers (2/2) Each Axis a purchase of an item associate with that pet The box at the intersection the number of customers who purchased the corresponding items Four segments of customers 1. Only-dog-owners 2. Only-cat-owners 3. Only-fish-owners and cat-and-dog-owners 4. The rest can be lumped together as “others” 2017/5/5 Data Mining 9 Cluster Analysis based on RFM (1/2) 透過RFM值的分析可以量化顧客消費行為 並且衡量顧客忠誠度和貢獻度,以利顧客分群 及目標客戶的鎖定 R(Recency): 最近購買日 the time period since the last purchase; F(Frequency): 購買頻率 the number of purchases made in a certain time period; M(Monetary):購買金額 the amount of money spent during a certain period of time. 2017/5/5 Data Mining 10 Cluster Analysis based on RFM (2/2) 取得某一時間區間內客戶們的RFM值 進行叢聚分析 Average RFM values of each cluster (Vc) are compared with the total average RFM values of all clusters (Vt) if vc > vt then give else give 目標客戶與行銷策略 R F M : Promising customers R F M : Loyal customers R F M : Vulnerable customers 有些變化的組合很難去解釋、以及變化的幅度未考量 2017/5/5 Data Mining 11 Examples of Clustering Applications • Marketing: Help marketers discover distinct groups in their customer bases, and then use this knowledge to develop targeted marketing programs • Land use: Identification of areas of similar land use in an earth observation database • Insurance: Identifying groups of motor insurance policy holders with a high average claim cost • City-planning: Identifying groups of houses according to their house type, value, and geographical location • Text Mining: 文件分類、客服申訴處理、病人病例分析、軍 事刑事情報管理 (關鍵字結構的相似性) 2017/5/5 Data Mining 12 Data Classification 與 Data Clustering之比較 Data Classification – 是根據資料的屬性和一些預先建立的規則(Rule)來將資料 分類 – 事前必先對資料的結構有一定的了解才能實行 – 找出許多(輸入)變數與命題(輸出變數)之間的關連性 Data Clustering – 它不需要了解資料庫中的資料特色和結構,就能把資料分 類成群 – 讓群組內的資料相似度最高,讓群組跟群組間的資料相似 度最低 – 呈現變數之間的結構,有比較多的詮釋空間 2017/5/5 Data Mining 13 Description and Visualization (1/2) 描述在複雜的資料庫中到底發生了什麼?透過這種 方式,可以讓我們對我們的客戶、產品以及流程等 有更多的認識與了解。 − A good enough description of a behavior will often suggest an explanation for it as well parental movie viewing habits are strongly influenced by the taste of children 2017/5/5 Data Mining 14 Description and Visualization (2/2) Data visualization is one powerful form of descriptive data mining. − It is not always easy to come up with meaningful visualizations, but the right picture really can be worth a thousand association rules − Data Cube, Scatter graph, Histogram, … 2017/5/5 Data Mining 15 資料探勘的技術 統計分析 (Statistic Analysis) 關聯分析 (Association Analysis) 分類法 (Classification) 叢聚分析 (Clustering Analysis) 其他的技術 – 趨勢分析 (Trend Analysis)、時間序列分析 (Time Serial Analysis)、迴歸分析 (Regression Analysis)、 異常值分析 (Outlier Analysis)或是人工智慧領域 中的類神經網路(Neural Network)技術……等。 2017/5/5 Data Mining 16 All six tasks in one small database 以電影迷(Moviegoers)資料庫為例 • We wondered what movies a person watches Who goes to see a movie • The moviegoers database contains the responses to an informal survey conducted during August and September of 1996 • The Sample Populations the survey was distributed to four different populations in hopes that interesting intergroup differences might be revealed • The survey asked for age, sex, and last movies seen in a movie theater 2017/5/5 Data Mining 17 The layout of the moviegoers database 1 ∞ 1 ∞ ∞ ∞ 2017/5/5 1 Data Mining 18 Moviegoer Survey (The first few rows are shown) 2017/5/5 姓名 性別 年紀 來源地點 電影名稱 Amy 女 27 Oberlin Independence day Andrew 男 25 Oberlin 12 monkeys Andy 男 34 Oberlin The birdcage Anne 女 30 Oberlin Trainspotting Ansje 女 25 Oberlin I shot andy wrrhol Beth 女 30 Oberlin Chain reaction Bob 男 51 Pinewoods Schindler’s list Brian 男 23 Oberlin Super cop Candy 女 29 Oberlin Eddie Cara 女 25 Oberlin Phenomenon Cathy 女 39 124Mt.Aubum The birdcage Charles 男 25 Oberlin Kingpin Curt 男 30 MRJ T2 judgment day David 男 40 MRJ Independence day Erica 女 23 124 Mt.Aubum trainspotting Data Mining 19 What can data mining do? (1/3) 電影迷分類(Moviegoer Classification) • 根據年齡、來源以及看的電影來區分性別 • 根據性別、年齡以及看的電影來區分來源 • 根據以往看過的電影、年齡、性別和來源去區分會看 什麼電影 (most recent movie) 技術: 決策樹 電影迷推估(Estimation) • 年齡為連續性變數,因此可以作為推估作業的目標變數。 • 年齡 = f(來源地點,性別,看過的電影) 2017/5/5 Data Mining 20 What can data mining do? (2/3) 電影迷預測(Prediction) − 預測一部新片上映時,誰會是它的觀眾? 將影迷與電影進行群集分析 針對每一群影迷,挖掘規則來解釋這群人的電影品味 針對每一群電影,挖掘規則描述其最佳目標觀眾 新電影上映時,由新電影所屬群集就可以找出目標觀眾 電影迷關聯分組(Affinity grouping) − 哪些電影總是被同類的人觀賞 (which movies go together?) − 經由產生的關聯法則來分析性別的分類 (Virtual items) 2017/5/5 Data Mining 21 What can data mining do? (3/3) 電影迷群集化 − to find groups of movies that go together because they are seen by the same people − to find groups of people that go together because they see the same movies people with young children form a clearly recognizable cluster in the moviegoers database 電影迷描述 − 基本統計量: 平均年齡、女性人口百分比。 − 關聯規則: 看過X電影的人也會看Y電影 − 規則也可視為一種描述:12~17歲的男性喜歡看X電影 2017/5/5 Data Mining 22 Evaluation and Interpretation Model validation – after building a model, you must evaluate its results and interpret their significance – accuracy by itself is not necessarily the right metric for selecting the best model. You need to know more about the type of errors and the costs associated with them Confusion matrices – for classification problem, a confusion matrix is a very useful tool for understanding results – it shows not only how well the model predicts, but also presents the details needed to see exactly where things may have gone wrong 2017/5/5 Data Mining 23 Confusion matrix (1/2) Model X Actual Prediction Class A Class B Class C Class A 45 2 3 Class B 10 38 2 Class C 4 6 40 – this is much more informative than simply telling us an overall accuracy rate of 82% (123/150) – If there are different costs associated with different errors, a model with a lower overall accuracy may be preferable to one with higher accuracy but a greater cost to the organization due to the types of errors it makes 2017/5/5 Data Mining 24 Confusion matrix (2/2) Model Y Actual Prediction Class A Class B Class C Class A 40 12 10 Class B 6 38 1 Class C 2 1 40 – The accuracy has dropped to 79% (118/150) – Suppose each correct answer had a value of $10 and each incorrect answer for class A had a cost of $5, for class B a cost of $10, and for class C a cost of $20 The net value of model X = (123*10)-(5*5)-(12*10)-(10*20) = 885 The net value of model Y = (118*10)-(22*5)-(7*10)-(3*20) = 940 2017/5/5 Data Mining 25 Confusion matrix 的使用 (1/4) Data mining: 利用historical data找出rare event 高度獲利或嚴重損失,但是針對所有的客戶採取行動,又 顯得划不來 使用confusion matrix可以獲得三種資訊: 3R Response Rate (回應率): 在我們預測的名單中找出多少稀 有事件? Recall (反查):預測出來的稀有事件佔總體稀有事件多少比 例? Range Reduce (間距縮減): 透過資料採礦模型來找尋稀有事 件時,名單縮小了多少? 2017/5/5 Data Mining 26 Confusion matrix 的使用 (2/4) 0: 不會購買 1:會購買 Actual Prediction Class 0 Class 1 Class 0 6855 2171 Class 1 2497 6961 Response Rate (回應率): 寧缺勿濫的能力 Response Rate = 6961 / (2497+6961) = 73.6% 總體Response Rate = (6961 + 2171) / (6855+2171+2497+6961) = 49.4% 回應率提升了1.49倍 2017/5/5 Data Mining 27 Confusion matrix 的使用 (3/4) 0: 不會購買 1:會購買 Actual Prediction Class 0 Class 1 Class 0 6855 2171 Class 1 2497 6961 Recall (反查):寧可殺錯一萬,不可誤放一人 Recall = 6961 / (6961+2171) = 76.22% Range Reduce :根據模型執行活動時的成本 Range Reduce = (6961 + 2497) / (6855+2171+2497+6961) = 51.2% 2017/5/5 Data Mining 28 Confusion matrix 的使用 (4/4) Which is the best model depends on the business problem For a marketing response problem, we want to get as many potential responders as possible and we do not care about false positives For a medical diagnostic test for cancer, we might use such a model as a initial screen. We care a lot about false negatives – and we want as few as possible 2017/5/5 Data Mining 29 The Lift (Gain) Chart • It shows how responses are changed by applying the model. This change ratio is called the lift 2017/5/5 Data Mining 30 The ROI (Return on Investment) Chart • A pattern may be interesting, but acting on it may cost more than the revenue or savings it generate • Here, ROI is defined as ratio of profit to cost 2017/5/5 Data Mining 31 The Profit Chart • Profit = revenue minus cost • The maximum lift was achieved at the 1st decile (10%), the maximum ROI at the 2nd decile (20%), and the maximum profit at the 3rd and 4th deciles 2017/5/5 Data Mining 32 External Validation No matter how good the accuracy of a model is estimated to be, there is no guarantee that it reflects the real world – One of the main reasons for this problem is that there are always assumptions implicit in the model The inflation rate may not have been included as a variable in a model that predicts the propensity of an individual to buy It is important to test a model in the real world – do a test mailing to verify the model – try the model on a small set of applicants before full deployment 2017/5/5 Data Mining 33 Deploy the model and results (1/2) The first way is for an analyst to recommend actions based on simply viewing the model and its results – The analyst may look at the clusters the model has identified, the rules that define the model, or the lift and ROI charts that depict the effect of the model The second way is to apply the model to different data sets – to flag records based on their classification, – to assign a score such as the probability of an action, or – can select some records from the database and subject these to further analyses with an OLAP tool, and so on 2017/5/5 Data Mining 34 Deploy the model and results (2/2) The amount of time to process each new transaction, and the rate at which new transactions arrive, will determine whether a parallelized algorithm is needed – Monitoring credit card transactions or cellular telephone calls for fraud When delivering a complex application, data mining is often only a small, albeit critical, part of the final product – In a fraud detection system, known patterns of fraud may be combined with discovered patterns You must measure how well your model has worked after you use it (model monitoring) – To be retested, retrained and possibly completely rebuilt 2017/5/5 Data Mining 35 Acting on the Results (1/2) Sometimes, it is valuable to incorporate a bit of experimental design into the process – If we are predicting customer response to a product, we might have three different groups 1) A group of customers based on the results of the Data Mining model, who get the marketing message 2) A group of customers chosen at random, who get the marketing message 3) A group of customers chosen at random, who do not get the marketing message 2017/5/5 Data Mining 36 Acting on the Results (2/2) – What we hope is that the first group will have a high response rate The second group will have a mediocre response rate The third will have a negligible response rate – We can test the strength of the marketing message The difference in response between the second and third groups – We can test the strength of the data mining 2017/5/5 The difference between the first and second groups Data Mining 37 Measuring the Model’s Effectiveness We need to compare the results to what actually happened in the real world – Did the predicted behavior actually happen? – – Did the prospects accept the offer, did the customers purchase the new product, did they churn? The lift charts and confusion matrixes can adapted to compare actual results to predicted results The score set is usually more recent than the model set 2017/5/5 Model performance usually degrades over time The model captures patterns from the past and, over time, the patterns become less relevant Data Mining 38 What Makes Predictive Modeling Successful? A. Modeling Shelf-Life B. The whole process of predictive modeling is based on some key assumptions 2017/5/5 Data Mining 39 A. Modeling Shelf-Life Looking at time frames bring up two critical questions about models and their predictions: What is the shelf-life of a model? • The things being modeled change over time • A model created five years ago, or last year, or last month, may no longer be valid • You need to train a new model on more recent data What is the shelf-life of a prediction? • 2017/5/5 Predictions are valid during a particular time frame Data Mining 40 B. Key Assumption 1 (1/2) The Past Is a Good Predictor of the Future – – How patients reacted to a drug in the past However, external factors will always have an influence on the model being built Retail sales decrease during cold weather and blizzards Mortgage lending increases when interest rates go down Seasonal patterns • 2017/5/5 The Christmas season and back-to-school season derive many retail sales The model developed during years of relatively stable financial markets were not applicable in the more volatile markets Data Mining 41 B. Key Assumption 1 (2/2) The Past Is a Good Predictor of the Future – How do we know when the past is a good predictor of the future ? We can never know for sure It is critical to 2017/5/5 Include domain experts (have insight about important factors) in the modeling process Include enough of the right data (seasonal factors) to make good decisions Data Mining 42 B. Key Assumption 2 The Data is Available – – Data may not be available for any number of different reasons The data may not be collected by the operational systems The data base is too busy most of the time to prepare extracts The data is owned by an outside vendor And so on Ensuring that the right data is available is critical to building successful predictive models 2017/5/5 Data Mining 43 B. Key Assumption 3 The Data Contains What We Want to Predict – To apply the lessons of the past to the future, we need to be comparing apples to apples and oranges to oranges Often, the business people phrase their needs very ambiguously We are interested in people who do not pay their bills Sometimes business users have unreasonable expectations from their data When building a response model, it must know who responded to the campaign and who received the campaign 2017/5/5 For advertising campaigns, the second group is not known However, we can compare the responders to a random sample of the general population Data Mining 44 Selecting Data Mining Products (1/3) There are three main types of data mining products 1) Tools that are analysis aids for OLAP Help OLAP users identify the most important dimensions and segments on which they should focus attention Business Objects Business Miner, Cognos Scenario 2) The “pure” data mining products Horizontal tools aimed at data mining analysts concerned with solving a broad range of problems IBM Intelligent Miner, Oracle Darwin, SAS Enterprise Miner, SGI MineSet, and SPSS Clementine 3) Analytic applications which implement specific business processes for which data mining is an integral part 2017/5/5 Customized packages with the data mining imbedded Data Mining 45 Selecting Data Mining Products (2/3) Basic capabilities – Nothing substitutes for actual hands-on experience – Depending on your particular circumstances – system architecture, staff resources, database size, problem complexity – some data mining products will be better suited than others to meet your needs – System architecture Work on a stand-alone desktop machine or a client-server architecture – Data preparation – Data access No single product can support the large variety of database servers – Algorithms 2017/5/5 Data Mining 46 Selecting Data Mining Products (3/3) Basic capabilities (continued) – Interfaces to other products Many tools can help you understand your data before you build your model, and help you interpret the results of your model These include traditional query and reporting tools, graphics and visualization tools, and OLAP tools – Model evaluation and interpretation – Model deployment When you need to apply the model to new cases as they come, it is usually necessary to incorporate the model into a program using an API or code generated by the data mining tool – Scalability – User interface The people who build, deploy, and use the results of the models may be different groups with varying skills 2017/5/5 Data Mining 47 The Virtuous Cycle of DM (1/2) Data mining can be applied to many problems in many industries – Most common applications are in marketing, specifically for CRM Applied to prospecting for new customers, retaining existing ones, and increasing customer value Applied to understanding customer behavior and optimizing manufacturing processes Although they may have much in common, every application has its own unique characteristics – Within a single industry, different companies have different strategic plans and different approaches 2017/5/5 Data Mining 48 The Virtuous Cycle of DM (2/2) The virtuous cycle is a high-level process, consisting of four major business processes: 1. 2. 3. 4. Identifying the business problem Transforming data into actionable results Acting on the results Measuring the results There are no shortcuts – success in DM requires all four processes – 2017/5/5 Expertise grows as organizations focus on the right business problems, learn about data and modeling techniques, and improve Data Mining processes based on the results of previous efforts Data Mining 49 Data Description and Data Mining Model Building (1/2) Data mining is a process that uses a variety of data analysis tools to discover patterns and relationships in data that may be used to make valid predictions The first and simplest analytical step in data mining is to describe the data – Summarize its statistical attributes (such as means and standard deviations) – Visually review it using charts and graphics (visualization) – Look for potentially meaningful links among variables (such as values that often occur together) – clustering collecting, exploring, and selecting the right data are critically important 2017/5/5 Data Mining 50 Data Description and Data Mining Model Building (2/2) In general, Data description alone cannot provide an action plan – You must build a predictive model based on patterns determined from known results (model training), then test that model on results outside the original sample (model testing) The accuracy (or error) rate is a good estimate of how the model will perform on the future dataset that are similar to the training and test datasets – finally, you must empirically verify the model • e.g., send a mailing to a portion of the new list and see what results you get 2017/5/5 Data Mining 51 Predictive Data Mining (1/2) A hierarchy of choices – Business goal What is the ultimate purpose of mining this data? Retain good customers, identify customers likely to leave, or predict customer profitability – Type of Prediction Classification or Regression – Model type Neural networks or decision trees Your choice of model type will influence what data preparation you must do and how you go about it – Algorithm – Product They generally have different implementations of a particular algorithm even they identify it with the same name 2017/5/5 Data Mining 52 Predictive Data Mining (2/2) No tool or technique is perfect for all data – Many business goals are best met by building multiple model types using a variety of algorithms – You may not be able to determine which model type is best until you’ve tried several approaches 2017/5/5 Data Mining 53 Summary (1/2) Data mining offers great promise in helping organizations uncover patterns hidden in their data that can be used to predict the behavior of customers, products and processes However, data mining tools need to be guided by users who understand the business, the data, and the general nature of the analytical methods involved 2017/5/5 Data Mining 54 Summary (2/2) Building models is only one step in knowledge discovery – It is vital to properly collect and prepare the data, and to check the models against the real world – The “best” model is often found after building models of several different types, or by trying different technologies or algorithms Choosing the right data mining products means finding a tool with good basic capabilities – an interface that matches the skill level of the people who’ll be using it, and features relevant to your specific business problems – After you’ve narrowed down the list of potential solutions, get a hands-on trial of the likeliest ones 2017/5/5 Data Mining 55 Data Mining: Classification Schemes • General functionality – Descriptive data mining – Predictive data mining • Different views, different classifications – Kinds of databases to be mined – Kinds of knowledge to be discovered – Kinds of techniques utilized – Kinds of applications adapted 2017/5/5 Data Mining 56 A Multi-Dimensional View of Data Mining Classification • Databases to be mined – Relational, transactional, object-oriented, object-relational, active, spatial, time-series, text, multi-media, heterogeneous, WWW, etc. • Knowledge to be mined – Characterization, discrimination, association, classification, clustering, trend, deviation and outlier analysis, etc. – Multiple/integrated functions and mining at multiple levels • Techniques utilized – Database-oriented, data warehouse (OLAP), machine learning, statistics, visualization, neural network, etc. • Applications adapted – Retail, telecommunication, banking, fraud analysis, DNA mining, stock market analysis, Web mining, Weblog analysis, etc. 2017/5/5 Data Mining 57 Data Mining and Business Intelligence Increasing potential to support business decisions Making Decisions Data Presentation Visualization Techniques Data Mining Information Discovery End User Business Analyst Data Analyst Data Exploration Statistical Analysis, Querying and Reporting 2017/5/5 Data Warehouses / Data Marts OLAP, MDA Data Sources Paper, Files, Information Data Providers, Database Systems, OLTP Mining DBA 58 資料庫之知識發掘的相關技術 2017/5/5 Data Mining 59 Architecture of a Typical Data Mining System Graphical user interface Pattern evaluation Data mining engine Knowledge-base Database or data warehouse server Filtering Data cleaning & data integration Databases 2017/5/5 Data Warehouse Data Mining 60 資料探勘的基本元件與概念性架構 2017/5/5 Data Mining 61 資料探勘在顧客關係管理之應用 • 零售業者而言 – 瞭解顧客消費特性,發掘顧客採購模式,強化客戶關 係,達到留住顧客目的 • 銀行業者而言 – 瞭解信用卡發放可能產生之弊端,找出最有利潤、忠 誠度佳的顧客 • 保險業者而言 – 分析保戶要求理賠之模式,並可加強稽核,以防止詐 財之發生 • 優點 – 有效地在不同層面增加公司收益,達成營運目標 2017/5/5 Data Mining 62 資料探勘在網路行銷之應用 • 分析顧客於網站上之行為模式 – 當顧客拜訪網站時,往往提供許多寶貴的資料,如個人資料、點 選的網頁內容、在網頁所停留的時間、利用搜尋引擎時所使用的 關鍵字、以及顧客到訪網站的時間點等,企業可藉由分析這些資 訊來瞭解顧客的行為模式,藉以提高顧客對公司所提供之產品與 服務的滿意度。 •應用範例 –可用以下特性區分訪客的特質 •地理區隔 –包括訪客地址、收入、購買能力 •人格特質 –訪客之購買特性,是否為衝動性或精打細算型 •訪客使用之資訊設備 –網路頻寬、操作系統、瀏覽器或伺服器 2017/5/5 Data Mining 63 資料探勘在網路入侵行為分析之 應用 • 發掘異常網路行為 – 傳統分析突發網路狀況,需很長時間 – 利用高速運算,分析異常網路行為、動態調整與更 新防禦機制 • 應用範例 – 協助網管執行進階的網路控管,並動態調整與更新 防禦機制,進而遏阻網路入侵攻擊的潛在威脅 – 協助網管建立正常網路行為模型、異常的行為模型 2017/5/5 Data Mining 64 資料探勘在網路學習之應用 • 適性化網路學習(Adaptive E-learning) – 提供適合學習路徑給不同背景學習者 –建構「學習概念圖(concept map)」規劃學生學習路徑 – 分析成績了解試題關連性,推導對應之概念 • 應用範例 – – – – – 利用關連法則探勘技術 分析學習者的學習成績並了解試題間的關連性 推導出相對應於試題之概念間的關連 找出可以幫助領域專家建構學習概念圖的法則 構建適切的課程概念圖。 2017/5/5 Data Mining 65 請 不要 輕 看 Data Mining Data Mining 的熱門應用領域 1. 生物科技產業與DNA資料分析 2. 金融資料分析 3. 零售業資料分析 4. 電信產業 Data Stream mining Privacy-Preserving mining Distributed data mining Mining of sequence data, multimedia, Web data Biological and biomedical data analysis 2017/5/5 Data Mining 66 請 不要 高 估 Data Mining Data Mining 並不是萬靈丹 Data Mining 的成功需要領域知識與經驗 Data Mining 的應用需要各類專家 討論題 – 想想看: 一個銀行的Data Mining案子 – 想要Mining 出 那種人可能信用不好 – 請問: 可能需要那幾種專家? 2017/5/5 Data Mining 67 Data Mining: Confluence of Multiple Disciplines Database Technology Machine Learning Statistics Data Mining Information Science 2017/5/5 Visualization Other Disciplines Data Mining 68 如何成為 Data Mining 專家 Data Mining 之 觀念與技術 Domain Knowledge (領域相關知識) 2017/5/5 不斷運用之經驗 Data Mining 69