Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Knowledge Management Concepts, Models and Applications (知识管理-概念,模型及应用) Jing Luan, Ph.D. (栾晶) Chief Planning, Research, & Knowledge Systems Officer (计划,研究,知识系统主任) Cabrillo College Founder, Knowledge Discovery Laboratory (知识发掘中心创建人) Beijing, China Octoboer, 2002 What’s Covered: 1. 2. 3. 4. 5. 6. 7. 8. Why KM? Key KM Concepts Tiered Knowledge Management (TKM) Model KM Applications in the US Data Mining Background Data Mining Algorithms and Applications Demonstration of Data Mining Q&A © Jing Luan, 2002 2 Quotes Knowledge is Information in Action – O’Dell and Grayson (APQC) Sharing knowledge is 90% culture, 5% technology and the rest is magic - Bob Buckman of Buckman Laboratories We live in an increasingly data rich, knowledge poor society (Luan) KM is to bring people to people and people to knowledge (Serban and Luan) 知识改变命运。 © Jing Luan, 2002 3 What is Knowledge? Strong influence from Philosophy Epistemology (philosophy dealing with origin of knowledge, foundations, and limits) Ontology (= metaphysics, deals with the essence of being/things) Plato (knowledge = personality, wisdom, science) © Jing Luan, 2002 4 World of Utter Confusion (看不破的红尘) Information Overload Misinformation (Stocks, Health) So many technologies so little clue Not tapping into existing knowledges Solution: Use KM principles Distinguish job functions and group technologies accordingly: TKM © Jing Luan, 2002 5 Technology Confusion(技术迷茫) Networking (IP, TCP/IP, VPN, WAN) Security (firewall, DMZ, SSL) Website (dynamic, push/poll, asp*, domain) Intranet OLAP vs. OLTP Data warehouse (Star Schema, cubes) Data mining (algorithms) ZXT Anything else? © Jing Luan, 2002 6 Knowledge Confusion (知识迷茫) What are data? What is information Meaningless by itself/(哲学-universe) Meaning from data/(哲学-observations) What is knowledge? What does information mean to me/(哲学sense of being) © Jing Luan, 2002 7 Why Knowledge Management (KM)? Technology advancement Professional specialization vs. multi-discipline approach Competition (price, time2market, knowledge) Workforce mobility and turnover Capitalize on organizational knowledge 知本就是资本。 In sum, “you snooze you lose”. (稍纵即逝) © Jing Luan, 2002 8 Current Western View of Knowledge - 一分为二 Explicit Knowledge (显) (Documented) Tacit Knowledge (隐) (Know-how embedded in people) Easily codified Personal Context-specific Storable Transferable Easily expressed and shared Difficult to formalize Difficult to capture /communicate Sources Sources Databases and reports Personal experiences Manuals Policies and procedures Informal business processes Historical understanding/culture Website, advertisements Committees, task forces,团体 © Jing Luan, 2002 9 Is Knowledge Management Possible? YES! … according to KPMG, Gartner Group Fueled by technology and economics Storage capacity Internet, portal (门户), search engines (搜索引擎) CRM (From “All customers are right” to “Which one is better?”) © Jing Luan, 2002 10 Knowledge Management Principle First Principle (Tiers): Data => Information => Knowledge To demonstrate this point… Second Principle (Sharing): Knowledge Sharing (push/pull) Information Sharing (push/pull) Data Sharing (push/pull) © Jing Luan, 2002 11 TKM: Explicit Knowledge Management TIER THREE: Many data mining projects fail due to lack of understanding of these three tiers。 Mining : Clementine, Enterprise Miner, Statistica, Mineset, Darwin, SpotFire Classical statistics SPSS, SAS, BMDP, SysStat TIER TWO Querying: BrioQuery, Business Objects, PowerPlay Access, Foxpro, SPSS SmartViewer Online Data Processing: ASP, JSP, iHTML, XML 演示 (切片钻取) TIER ONE Data Engines SQL Server, Oracle, Informix, Sybase, UniData, DB2 Enterprise Resource Planning (ERP) PeopleSoft, Datatel, SAP, Oracle, Banner Topography of Tiered Knowledge Management Model (TKM) for explicit knowledge Courtesy of Jossey-Bass © Jing Luan, 2002 12 Benefits of TKM - Explicit Informed use of technology Planned skill upgrade Balancing resource allocation Defining relationship with IT Enhancing role of analyst Improving decision making process Purposeful outsourcing © Jing Luan, 2002 13 Tiered Knowledge Management Model (TKM) Tiers: Tiers: three two one Data Mining Middleware OLAP Knowledge Base Knowledge Workers Portals CRM Data Warehouses Enterprise Resource Planning (ERP) Courtesy of Jossey-Bass Collaborative Working Environment (CWE) Knowledge Mapping Explicit Knowledge one two three Tacit Knowledge © Jing Luan, 2002 14 Knowledge Example – all facets Things related to Cell Phone that’s knowledge intensive: Cell Phone design Cell Phone technology Cell Phone number Cell Phone conversation Cell Phone bill Cell Phone hazard © Jing Luan, 2002 15 TKM Model Illustrated Tier One Tier Two Tier Three Data Holding Medium Information Processing Data Mining Student enrollment data Learning outcome data Census data Enrollment trends analysis Student GPA report Socio-economic status Which student is likely to persist? Which clusters of students will have GPA>3.75? What are associated with any course-taking pattern? Decisions Insights Knowledge Competencies Accountability Portals CRM Tier One Tier Two Tier Three Knowledge Base Collaborative Working Envt. Knowledge Mapping Personal experiences Skills Values Relationships Organization structures Curriculum Committees Identifying Mission/Policies Writing Manuals Faculty Experts Group Leaders Librarians Analysts/Institutional Researchers © Jing Luan, 2002 16 KM Taxonomy (分类)of Products Business Intelligence (商业智能); Knowledge Base; Collaboration; Content and Document Management; Portals (i.e.,Yahoo); Customer Relationship Management (CRM); Data Mining (i.e., Clementine, LexQuest); Workflow; E-Learning; Search. © Jing Luan, 2002 17 Data Mining Topics covered: Data Mining Overview: concept & demo Data mining, statistics and OLAP Skills needed Software evaluation Data Mining plan at your organization © Jing Luan, 2002 18 Data Mining Definition Data mining is for capitalizing on the advances of technology and the extreme richness of enterprise data for improving research and decision making through uncovering hidden trends and patterns that lend them to predicative modeling using a combination of explicit knowledge base, sophisticated analytical skills and domain (行业) knowledge. Jing Luan © Jing Luan, 2002 19 Why Must Data Mining? Best way out of sea of data Workbench of major tools Tolerant of multicollinearity Appetite for large dataset Sledge hammer vs. chisel (鸟枪换炮) Analyst’s impact/affiliation with databases Domain knowledge (行业知识)intensive research A good addiction © Jing Luan, 2002 20 But I Spent Years Learning Statistics! But I Use OLAP For All My Work! Statistics knowledge is very useful. Data mining cannot replace statistics in a number of areas. There are overlapping areas. OLAP is the middle tier. We must go beyond counting heads! © Jing Luan, 2002 21 How Do Data Mining, Statistics and OLAP Compare Data Mining Statistics OLAP Predictive Research Historical Neural Net Regression, Structural Equation … C5.0, C&RT PCA, Discriminant … Kohonen, K-means, TwoStep Cluster Analysis, Probability Density Cubes Spatial Visualization 2-3 dimension charts 2-3 dimension charts Machine Learning/ Artificial Intelligence Mathematics ETL, SQL Unsupervised Learning Descriptive Statistics, Cluster Analysis © Jing Luan, 2002 Temporal/Trend Reporting 22 2 TYPES OF DATA MINING SUPERVISED (直接) Purpose: For classification (分类) and estimation (估计) Models C5.0, C&RT, NN, etc UNSUPERVISED (间接) Purpose For clustering and association (关联) Models Kohonen, Kmeans, TwoStep GRI, etc. 1. Clustering (unsupervised) and Predictive Modeling (supervised) often go hand in hand with clustering preceding the other. © Jing Luan, 2002 2. “pre-classified data” means data without target. 23 Data Mining Tasks Predicting onto new data by using rules or patterns and behaviors Classification/Estimation Understanding the groupings, trends, and characteristics of your customer Clustering/Association Visualizing the Euclidean spatial relationships, trends, and patterns of your data Description © Jing Luan, 2002 24 Cross-Industry Use of DM Banking Telco Medicine Higher Ed Clustering fraud, segmentation net load, link analysis Genomic, cell differentiation. learning outcomes, Std groups Predicting credit risk addl cards peak hour, addl services, churn disease progress, epidemics GPA donations Visualizing © Jing Luan, 2002 25 Artificial Neural Networks (ANN) 神经元网络 (人工智能) Multi-layer perceptron (MLP): feed forward back propagation x1 # of Terms w1 x2 GPA Persistence x3 Demographics x4 Courses x5 Fin Aid… w5…n n oj f oi w ji i 1 xj…n © Jing Luan, 2002 26 Decision Trees – Rule Induction (决策树-归纳逻辑 ) Rule 1: If Income ≧ $55,000 and # of Children = 3, then multiple policies Rule 2: If Income < $55,000, and single and Age < 30, then single policy Information theorem: H ( N ) n P(n) log2 P(n) i 1 © Jing Luan, 2002 27 Clustering (聚类) Fundamental to science and understanding of our world No restrictions on number of clusters Clusters change continuously Grouping-Clustering-Classify-Typology Carnegie and Bloom in education Potentially increasing scoring accuracy © Jing Luan, 2002 28 Data Mining in (电信) Telecommunications Industry Infrastructure (assets) Network load analysis Employee productivity analysis Automation Customer (sales, marketing) Link analysis Usage by time, location Customer feedback Churn Rolling out new service (for either existing or new customers) © Jing Luan, 2002 29 Telecommunication Example – Increasing profitable calls A two step process: 1. Clustering -Who is a high cost customer? 2. Predication -Who is likely to be a profitable caller? Tips: May recalculate call lengths and call intervals to reveal what’s not in the data warehouse. May visualize data first. Merge with customer surveys data. © Jing Luan, 2002 30 Banking Industry Use of DM Customer Credit Risk (pay your bills!) Fraud (Jing: your account is frozen) Customer Value 1. Depositors and users 2. Customer typing - Transacters - Convenience users - Revolvers © Jing Luan, 2002 31 Evaluating Data Mining Software Company stability and customer feedback User Interface Scalability (up and down) Server/Client (real-time, KDD) Modeling capacities Learning Curve Join a listserv, such as CLUG Cost © Jing Luan, 2002 32 Data Mining Skills Set Driving Forces of DM: Computer Storage Algorithms Knowledge Management Translate to Skill-set: Data domain expert Familiar w/ models Business domain system level view of decision making (以系 统的观点制定决策) © Jing Luan, 2002 33 © Jing Luan, 2002 34 CRISP-DM Business Understanding (Zero in on the specific goal of the data mining task) Data Understanding (Do you have the data?) Data Preparation (case to variables, missing values, recalculate fields) Modeling (typing, balancing, test/validation datasets, bootstrapping, cross-algorithm validation) Evaluation Deployment © Jing Luan, 2002 35 Data Mining Plan at Your Organization 1. 2. 3. 4. 5. 6. 7. Determine business needs Determine technology infrastructure and management support Determine data source (got milk, got DW?) Identify mining areas Invite an expert to jump start Pilot test mining results CRISP-DM and Real-time data mining, Knowledge Discovery in Databases (KDD) © Jing Luan, 2002 36 When Data Mining Is Not Needed? The world is increasingly moving toward predictive modeling – we must know what’s next so as to better prepare ourselves, but you don’t need it if: You are a mom-pop shop You have no data warehouse You do not have people using the tools You conduct small experimental studies © Jing Luan, 2002 37 Luan’s One-percent Doctrine Average five year growth of investment: 10% (ROI = 10%) What’s the ROI of Data Mining? 25,000 enrollment ($5,000/ea) One-percent increase (250 * $5,000=$1,250,000) Data mining total cost: ($75,000 + $50,000 = $125,000) ROI Ratio ($1,250,000 / $125,000 = 10) Or ROI Rate = 1,000% Or “Give me a buck and I will turn it into 10!”. © Jing Luan, 2002 38 Lift Chart: Gain Chart Hypothetical database marketing campaign Lift quota 35% Savings ($) 25% 0 40th percentile 70th percentile If every percentage point = $2,500, savings =(70% * $2,500) – (40% * $2,500) = $175,000 - $100,000 = $75,000 BACK © Jing Luan, 2002 39 Text mining 80% of information is in texts. Email (not including SMS) Survey (political polls, marketing, CRM online feedback) Articles (memos, policies, manuals) Books (what have you) Web pages (static & dynamic) 26% on paper and 20% in digital media © Jing Luan, 2002 40 China Impression Urge for learning is very strong Technology understanding is deep Believing that data mining only functions to give a slight edge when economic growth levels off Funding/Systematic approach not complete: Lack of funding for explicit knowledge Unique issues in tacit knowledge & 关系 Disparate tech advances © Jing Luan, 2002 41 From data to… An Analyst/Data Miner His Boss © Jing Luan, 2002 42 to information to power… CEO (总裁) Vice President © Jing Luan, 2002 43 KPI (Key Process Indicator) S (specific) M (measurable) A (attainable) R (realistic) T (timeline) © Jing Luan, 2002 44 Who’s Coming to Dinner? Online KM/DM discussion group and future data mining workshops: http://www.kdl1.com/kmdm/index.htm © Jing Luan, 2002 45