Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Data mining 1 Outline Definition Techniques Hypothesis Verification Knowledge Discovery Online Analytical Processing Location Considerations Benefits and Challenges 2 Data Mining資料探勘 Analyzing the data in a data warehouse to reveal hidden patterns, relationship, and trends in historical business activity 為資料倉儲主要用途之一。藉由資料探勘,能 試圖從資料庫中所儲存的商業活動記錄,找出 潛在的趨勢、關係、或模型。 3 Data Mining 選取 轉換 探勘(型態) 解釋評估 4 Mining for Decision Support 5 Data Mining Functions Determining classifications for the data Forming clusters from the data Determining whether certain associations exist in the data Determining whether any pattern or sequence exists in the data. 6 兩類探勘方式 Hypothesis Verification假設驗証 Querying Modeling User-driven Knowledge Discovery知識發掘 Patterns System-driven 7 One example-Harrah’s gambling casino Using stripe card called “Total Reward” 30% of customers spent $100 to $300, account for 80% of revenue and 100% of profit. 90 demographic segments Age and distance are two factors of being a repeat customer 8 Data Mining Techniques Recency, frequency, monetary (RFM) Decision trees Thirty-one permutations of sorting four variables (customer number, recency, frequency, monetary) Inexpensive; easy to perform More complex than RFM Helps turn complex data representation into a much easier structure Cluster analysis Place customers/prospects into groups such that everyone in the group has similar traits Categories include demographics, psychographics, behavioral, geographic 9 Other Data Mining techniques Artificial neural network類神經網絡, business intelligence (BI), data stream mining, fuzzy logic, nearest neighbor最鄰近者algorithm, pattern recognition, relational data mining, text mining, chi-Square, t-test, regression迴歸, correlation 10 A Decision Tree 11 Genetic Algorithms基因演算法 Software that uses Darwinian, randomizing, and other mathematical functions to simulate an evolutionary process that can yield increasingly better solutions to a problem 利用達爾文定律(適者生存)、隨機化與數學函數, 來模擬演化的過程,以產生更佳的解決方案 特別適用於有數千種可能的解決方案,但必須產生一 個最佳解決的情況。 利用幾組數學程序規則,指定各程序元件或步驟的組 合方式,透過隨機程序結合,將程序中優良的部分加 以組合,並選出良好的程序組而捨棄較差的程序組, 以產生最好的解決方案 12 13 Memory Based Reasoning Data records for entities that have a known behavior pattern are grouped to form a test case. Records for customers with unknown purchase habits can be matched to the profile. RFM (Recency, Frequency, Monetary level) is an example used as a predictor of future behavior. 14 Neural Networks類神經網路 Definition: Computing systems modeled after the brain’s mesh-like network of interconnected processing elements, called neurons 模擬人類的大腦架構,根據大腦的神經元(neurons)的處理元件(PE)所 組成網路系統 能辨識出處理資料的模式與關係,所接受的資料範例愈多,學習效 果也就越好 有一個輸入層、一個輸出層,以及數個隱藏的處理層。 每個PE節點對另一節點的影響力,視其不同權重而定。 Input代表問題的屬性,必須乘以「權重」,顯示其對下一個PE影響力的 強弱,亦即權重不同,Output的結果就不同。 由輸入的過去案例,可依據Output與Input特性的關係,決定最好的結構與 各路徑的權重,亦即所謂的機器學習。 要判斷的新案例輸入後,會自動預測出可能結果。 15 A neural network 16 類神經網路基本概念與架構 17 18 Online Analytical Processing (OLAP) 線上分析處理 Enables mangers and analysts to interactively examine and manipulate large amounts of detailed and consolidated data from many perspectives 讓公司主管與分析師從各種角度切入,利用互 動方式來處理大量細部與合併的資料 19 20 Analytical Operations Consolidation – aggregation of data合併指的是將資 料聚集整合,這包括簡單的向上整編(roll-up),或是 複雜的相關資料群集。 Drill-down – detail data that comprise consolidated data向下擷取:將整併的資料,以反方向(由上往下) 的顯示細部資料 Slice and Dice – ability to look at the database from different viewpoints交叉分析是從不同的觀點來檢視 資料庫的能力 21 OLAP Technology 22 23 Types of data mining system environments Decision Support Systems (DSS) “List current inventory, predict sales of products to be promoted, and list inventory requirements by store” “Determine who are responders and nonresponders for the last promotion” “Identify nonresponders from the last promotion and send them a second promotional offer using a different advertising copy” 24 Types of data mining system environments Executive Information Systems (EIS) – Dashboards “Provide ROI results for all sales promotions for the last sixty days” “Populate a spreadsheet with sales by product category from the Web, catalogue, and retail. Allow for simple data manipulation for the purpose of creating trend reports” 25 Types of data mining system environments Enterprise Resource Planning (ERP) “Process all online orders within twelve hours and send alert to quality and control when time limit is exceeded” “Automatically notify supplier to restock when inventory depletes to certain level” “Update customer service ODS with current customer order status information” 26 Types of data mining system environments CRM “Identify the most profitable customers by household level for the last twenty-four months and create a recognition strategy at different incremental levels based on profitability level” “Determine which customers have purchased for their own consumer needs versus on behalf of the company they work for and create a profitability index for each” “Examine customer purchase history and build a channel preference profile for each customer including time variations such as ‘snowbirds’” 27 Data Mining Location and access considerations Operational Data Store (ODS) Dynamic data repository Tactical and decision report applications Data limited to current operational needs Data warehouse (DW) More static than ODS Large depth and breadth of information Data transformed into knowledge Analysis strategy and planning applications Data marts (DM) Receives data from DW or ODS, but usually the former Limited but concentrated information Data transformed into knowledge Analysis, strategy and planning applications Usually designed for use as a narrow application Data mining and statistics 28 Data Mining Benefits Better understanding of customers and prospects supports relationship building efforts Measurable Fatigue prevention Precipitate new opportunities Fraud detection and identification of nonfavorable behavior 29 Data Mining Challenges Organizational obstacles to attaining data Cost versus benefit Ability to capture data Giving customer/prospect perception of invasiveness Privacy issues Sustained secondary availability Ability to perform data and information transformation Technology and analytical expertise “Analysis Paralysis” 30