Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Business Intelligence Systems Chapter Preview This chapter surveys the most common business intelligence and knowledge-management applications, discusses the need and purpose for data warehouses, and explains how business intelligence applications are delivered to users as business intelligence systems. Along the way, you’ll learn tools and techniques that MRV can use to identify the guides that contribute the most (and least) to its competitive strategy. We’ll wrap up by discussing some of the potential benefits and risks of mining credit card data. Study Questions Q1 Why do organizations need business intelligence? Q2 Q3 Q4 Q5 Q6 Q7 Q8 What business intelligence systems are available? What are typical reporting applications? What are typical data-mining applications? What is the purpose of data warehouses and data marts? What are typical knowledge-management applications? How are business intelligence applications delivered? 2020? Why Do Organizations Need Business Intelligence? • Information systems generate enormous amounts of operational data that contain patterns, relationships, clusters, and other information that can facilitate management, especially planning and forecasting. Business intelligence systems produce such information from operational data. • Data communications and data storage are essentially free, enormous amounts of data are created and stored every day. 12,000 gigabytes per person of data, worldwide in 2009 How Big Is an Exabyte? (See video) Study Questions Q1 Why do organizations need business intelligence? Q2 What business intelligence systems are available? Q3 Q4 Q5 Q6 Q7 Q8 What are typical reporting applications? What are typical data-mining applications? What is the purpose of data warehouses and data marts? What are typical knowledge-management applications? How are business intelligence applications delivered? 2020? Business Intelligence (BI) Tools • BI systems provide valuable information for decision making. (BI video) • Three primary BI systems: 1. Reporting Tools • Integrate data from multiple systems • Sorting, grouping, summing, averaging, comparing data 2. Data-mining Tools • Use sophisticated statistical techniques, regression analysis, and decision tree analysis • Used to discover hidden patterns and relationships • Market-basket analysis Business Intelligence Tools 3. Knowledge-management tool • Create value by collecting and sharing human knowledge about products, product uses, best practices, other critical knowledge • Used by employees, managers, customers, suppliers, others who need access to company knowledge Tools vs. Applications vs. Systems • BI tool is one or more computer programs. BI tools implement the logic of a particular procedure or process. • BI application is the use of a tool on a particular type of data for a particular purpose. • BI system is an information system having all five components that delivers results of a BI application to users who need those results. Study Questions Q1 Q2 Why do organizations need business intelligence? What business intelligence systems are available? Q3 What are typical reporting applications? Q4 Q5 Q6 Q7 Q8 What are typical data-mining applications? What is the purpose of data warehouses and data marts? What are typical knowledge-management applications? How are business intelligence applications delivered? 2020? Basic Reporting Operations • Reporting tools produce information from data using five basic operations: Sorting Grouping Calculating Filtering Formatting List of Sales Data Data Sorted by Customer Name Sales Data, Sorted by Customer Name and Grouped by Orders and Purchase Amount Sales Data Filtered to Show Repeat Customers and Formatted for Easier Understanding RFM Analysis • RFM analysis allows you to analyze and rank customers according to purchasing patterns as this figure shows. R = how recently a customer purchased your products F = how frequently a customer purchases your products M = how much money a customer typically spends on your products RFM Tools Classify Customers? Divides customers into five groups and assigns a score from 1 to 5 • R score 1 = top 20 percent in most recent orders • R score 5 = bottom 20 percent (longest since last order) • F score 1 = top 20 percent in most frequent orders • F score 5 = bottom 20 percent least frequent orders • M score 1 = top 20 percent in most money spent • M score 5 = bottom 20 percent in amount of money spent Example of RFM Score Data • Figure 9-6 Interpreting RFM Score Results • Ajax has ordered recently and orders frequently. M score of 3 indicates it does not order most expensive goods. A good and regular customer but need to attempt to upsell more expensive goods to Ajax • Bloominghams has not ordered in some time, but when it did, ordered frequently, and orders were of highest monetary value. May have taken its business to another vendor. Sales team should contact this customer immediately. Interpreting RFM Score Results • Caruthers has not ordered for some time; did not order frequently; did not spend much. Sales team should not waste any time on this customer. • Davidson in middle Set up on automated contact system or use the Davidson account as a training exercise Online Analytical Processing (OLAP) • OLAP, a second type of reporting tool, is more generic than RFM. • OLAP provides the ability to sum, count, average, and perform other simple arithmetic operations on groups of data. • Remarkable characteristic of OLAP reports is that they are dynamic. The viewer of the report can change report’s format, hence the term online. How Are OLAP Reports Dynamic? • OLAP reports Simple arithmetic operations on data • Sum, average, count, and so on Dynamic • User can change report structure • View online Measure • Data item to be manipulated—total sales, average cost Dimension • Characteristic of measure—purchase date, customer type, location, sales region OLAP Product Family and Store Type OLAP Reports • OLAP cube Presentation of measure with associated dimensions a.k.a. OLAP report • Users can alter format. • Users can drill down into data. Divide data into more detail • May require substantial computing power OLAP Product Family and Store Location by Store Type OLAP Product Family and Store Location by Store Type, Drilled Down to Show Stores in California OLAP Servers • Developed to perform OLAP analysis • Server reads data from operational database • Performs calculations • Stores results in OLAP database • Third-party vendors provide software for more extensive graphical displays. • Data Warehousing Review • OLAP services Role of OLAP Server and OLAP Database Study Questions Q1 Q2 Q3 Why do organizations need business intelligence? What business intelligence systems are available? What are typical reporting applications? Q4 What are typical data-mining applications? Q5 Q6 Q7 Q8 What is the purpose of data warehouses and data marts? What are typical knowledge-management applications? How are business intelligence applications delivered? 2020? Convergence of Disciplines and Information Technology Unsupervised Data Mining • Analysts do not create model before running analysis. • Apply data-mining technique and observe results • Analysts create hypotheses after analysis to explain patterns found. No prior model about the patterns and relationships that might exist • Common statistical technique used: Cluster analysis to find groups of similar customers from customer order and demographic data Supervised Data Mining • Model developed before analysis • Statistical techniques used to estimate parameters • Examples: Regression analysis—measures impact of set of variables on one another Used for making predictions Regression Analysis CellphoneWeekendMinutes = 12 + (17.5 * CustomerAge) + (23.7 * NumberMonthsOfAccount) • Using this equation, analysts can predict number of minutes of weekend cell phone use by summing 12, plus 17.5 times the customer’s age, plus 23.7 times the number of months of the account. • Considerable skill is required to interpret the quality of such a model Neural Networks Neural networks • Popular supervised data-mining technique used to predict values and make classifications such as “good prospect” or “poor prospect” customers • Complicated set of nonlinear equations • See kdnuggets.com to learn more Market-Basket Analysis • Market-basket analysis is a data-mining technique for determining sales patterns. Uses statistical methods to identify sales patterns in large volumes of data Shows which products customers tend to buy together Used to estimate probability of customer purchase Helps identify cross-selling opportunities • "Customers who bought book X also bought book Y” Hypothetical Sales Data of 1,000 Items at a Dive Shop Market-Basket Terminology • Support Probability that two items will be bought together Fins and masks purchased together 150 times, thus support for fins and a mask is 150/1,000, or 15 percent Support for fins and weights is 60/1,000, or 6 percent Support for fins along with a second pair of fins is 10/1,000, or 1 percent Market-Basket Terminology • Lift Ratio of confidence to base probability of buying item Shows how much base probability increases or decreases when other products are purchased • Example: Lift of fins and a mask is confidence of fins given a mask, divided by the base probability of fins. Lift of fins and a mask is .5556/.28 = 1.98 Market-Basket Terminology • Confidence What proportion of the customers who bought a mask also bought fins? Conditional probability estimate • Example: » Probability of buying fins = 28% » Probability of buying swim mask = 27% • After buying fins, » Probability of buying mask = 150/270 or 55.56% Likelihood that a customer will also buy fins almost doubles, from 28% to 55.56%. Thus, all sales personnel should try to sell fins to anyone buying a mask. Decision Trees Decision tree • Hierarchical arrangement of criteria that predict a classification or value • Unsupervised data-mining technique • Basic idea of a decision tree Select attributes most useful for classifying something on some criteria that create disparate groups • More different or pure the groups, the better the classification Decision Tree If Senior = Yes If Junior = Yes • Figure CE16-3 Decision Tree for Loan Evaluation • Common business application Classify loan applications by likelihood of default Rules identify loans for bank approval Identify market segment Structure marketing campaign Predict problems Decision Tree Analysis of MIS Class Grades • Student’s characteristics Class (junior or senior), major, employment, age, club affiliations, and other characteristics • Values used to create groups that were as different as possible on the classification GPA above or below 3.0 • Results Best criterion—Class Next subdivide Seniors and Juniors into more pure groups » Seniors—business and non-business majors » Juniors—restaurant employees and non-restaurant employees Best classifier is whether the junior worked in a restaurant Create Set of If/Then Decision Rules • If student is a junior and works in a restaurant, then predict grade > 3.0. • If student is a senior and is a non-business major, then predict grade < 3.0. • If student is a junior and does not work in a restaurant, then predict grade < 3.0. • If student is a senior and is a business major, then make no prediction. A Decision Tree for a Loan Evaluation • • • • Classifying likelihood of default Examined 3,485 loans 28 percent of those defaulted Evaluation criteria A. Percentage of loan past due less than 50 percent = .94, no default B. Percentage of loan past due greater than 50 percent = .89, default • Subdivide groups A and B each into three classifications: CreditScore, MonthsPastDue, and CurrentLTV A Decision Tree for a Loan Evaluation Resulting rules • If the loan is more than half paid, then accept the loan. • If the loan is less than half paid and If CreditScore is greater than 572.6 and • If CurrentLTV is less than .94, then accept the loan. • Otherwise, reject the loan. • Use this analysis to structure a marketing campaign to appeal to a particular market segment • Decision trees are easy to understand and easy to implement using decision rules. • Some organizations use decision trees to select variables to be used by other types of data-mining tools. Credit Score Decision Tree Figure CE14-4 Study Questions Q1 Q2 Q3 Q4 Why do organizations need business intelligence? What business intelligence systems are available? What are typical reporting applications? What are typical data-mining applications? Q5 What is the purpose of data warehouses and data marts? Q6 Q7 Q8 What are typical knowledge-management applications? How are business intelligence applications delivered? 2020? What Is the Purpose of Data Warehouses and Data Marts? • Purpose: (video) To extract and clean data from various operational systems and other sources To store and catalog data for BI processing Extract, clean, prepare data Stored in data-warehouse DBMS Components of a Data Warehouse Data Warehouse Data Sources • Internal operations systems • External data purchased from outside sources • Data from social networking, user-generated content applications • Metadata concerning data stored in datawarehouse meta database • Clickstream data of customers’ clicking behavior on a Web site Example Typical of Customer Credit Data Problems with Operational Data • Dirty data—mistakes in spelling or punctuation, incorrect data associated with a field, incomplete or outdated data or even data that is duplicated in the database. Examples of Dirty Data • A value of “B” for customer gender • 213 for customer age • Value of 999–999–9999 for a U.S. phone number • Part color of “gren” • mail address of [email protected]. Problems with Operational Data Too much data causes: • Curse of dimensionality 1. Problem caused by the exponential increase in volume associated with adding extra dimensions to a (mathematical) space. 2. Too many rows or data points 3. With more attributes, the easier it is to build a model that fits the sample data but that is worthless as a predictor. • Major activities in data mining concerns efficient and effective ways of selecting attributes. Data Warehouses vs. Data Marts Data mart is a collection of data (video) Created to address particular needs • Business function • Problem • Opportunity Smaller than data warehouse Users may not have data management expertise • Need knowledgeable analysts for specific function Data extracted from data warehouse for a functional area Components of a Data Mart Study Questions Q1 Q2 Q3 Q4 Q5 Why do organizations need business intelligence? What business intelligence systems are available? What are typical reporting applications? What are typical data-mining applications? What is the purpose of data warehouses and data marts? Q6 What are typical knowledge management applications? Q7 Q8 How are business intelligence applications delivered? 2020? Knowledge Management (KM) • The process of creating value from intellectual capital and sharing that knowledge with employees, managers, suppliers, customers, and others who need it. • Reporting and data mining are used to create new information from data, knowledgemanagement systems concern the sharing of knowledge that is known to exist. Primary Benefits of KM 1. KM fosters innovation by encouraging the free flow of ideas. 2. KM improves customer service by streamlining response time. 3. KM boosts revenues by getting products and services to market faster. 4. KM enhances employee retention rates by recognizing the value of employees’ knowledge and rewarding them for it. 5. KM streamlines operations and reduces costs by eliminating redundant or unnecessary processes. 6. KM preserves organizational memory by capturing and storing the lessons learned and best practices of key employees. Sharing of Document Content and Employee Knowledge • Sharing Document Content Collaboration systems are concerned with document creation and change management, KM applications are concerned with maximizing content use. Two Typical KnowledgeManagement Applications Two key technologies for sharing content in KM systems: 1. Indexing—most important content function in KM applications that provide easily accessible and robust means of determining if content exists and a link to obtain the content. Used in conjunction with search functions. Two Typical KnowledgeManagement Applications RSS (Real Simple Syndication)—a standard for subscribing to content sources on Web sites. An RSS Reader program helps users to: Subscribe to content sources. Periodically check sources for new or updated content through RSS feeds. Place content summaries in an RSS inbox with link to the full content. Think of RSS as an email system for content Data source must provide what is termed an RSS feed, which simply means that the site posts changes according to one of the RSS standards. Interface of a Typical RSS Reader Blog Posts of SharePoint Team Member Expert Systems • Expert systems attempt to capture human expertise and put it into a format that can be used by nonexperts. • Expert systems are rule-based systems that use IfThen rules similar to those created by decision-tree analysis, except they are created from human experts instead of datamining systems. Problems of Expert Systems 1. Difficult and expensive to develop. They require many labor hours from both experts in the domain under study and designers of expert systems. High opportunity cost of tying up domain experts. 2. Difficult to maintain. Nature of rule-based systems creates unexpected consequences when adding a new rule in middle of hundreds of others. A small change can cause very different outcomes. 3. No expert system has the same diagnostic ability as knowledgeable, skilled, and experienced doctors. Rules/actions change frequently. Expert Systems for Pharmacies • Used as a safety net to screen decisions of doctors and other medical professionals. These systems help to achieve hospital’s goal of state-of-the-art, error-free care. • DoseChecker, verifies appropriate dosages on prescriptions issued in the hospital. • PharmADE, ensures that patients are not prescribed drugs that have harmful interactions. • Pharmacy order-entry system invokes these applications as a prescription is entered. If either system detects a problem with the prescription, it generates an alert. Pharmacy Alert Study Questions Q1 Q2 Q3 Q4 Q5 Q6 Why do organizations need business intelligence? What business intelligence systems are available? What are typical reporting applications? What are typical data-mining applications? What is the purpose of data warehouses and data marts? What are typical knowledge-management applications? Q7 How are business intelligence applications delivered? Q8 2020? How Are Business Intelligence Applications Delivered? What Are the Management Functions of a BI Server? • Maintains metadata about authorized allocation of BI results to users Tracks what results are available, what users are authorized to view those results, and schedule to provide results to authorized users. Adjusts allocations as available results change and users come and go. BI Servers Vary in Complexity and Functionality • Some BI servers are simply Web sites from which users can download, or pull BI application results. • For example, a BI Web server might post results of an RFM analysis for salespeople to query to obtain RFM scores for their customers. Management function for such a site would simply be to track authorized users and restrict access. BI Servers Vary in Complexity and Functionality • BI server could operate as a portal server. BI Portals • Portals might provide common data such as local weather, and links to company news, and to BI application results such as reports on daily sales, operations, new employees, and results of datamining applications. • Authorized users are allowed to place reports, data-mining results, or other BI application results on their customized pages. • BI application server pushes the subscribed results to the user. Report Server • A special case of a BI application server that serves only reports • BI application servers track results, users, authorizations, page customizations, subscriptions, alerts, and data for any other functionality provided. What Are the Delivery Functions of a BI Server? • Track authorized users • Track the schedule for providing results to users • Issue exception alerts that notify users of an exceptional event • Procedures used depends on the nature of the BI system • Procedures tend to be more flexible than those in an operational system because users of a BI system tend to be engaged in work that is neither structured nor routine • Procedures are determined by unique requirements of users • BI results can be delivered to “any” device, such as computers, PDAs, phones, other applications such as Microsoft Office, and as a SOA service Study Questions Q1 Q2 Q3 Q4 Q5 Q6 Q7 Why do organizations need business intelligence? What business intelligence systems are available? What are typical reporting applications? What are typical data-mining applications? What is the purpose of data warehouses and data marts? What are typical knowledge-management applications? How are business intelligence applications delivered? Q8 2020? 2020? • Through data mining, companies, known as “data aggregators”, will know more about your purchasing psyche than you, your mother, or your analyst. • If you use your card to purchase “secondhand clothing, retread tires, bail bond services, massages, casino gambling or betting” you alert the credit card company of potential financial problems and, as a result, it may cancel your card or reduce your credit limit. • Absent laws to the contrary, by 2020 your credit card data will be fully integrated with personal and family data maintained by the data aggregators (like Acxiom and ChoicePoint). • By 2020, some online retailers will know a lot more about you, data aggregators, and most consumer’s purchases than we’ll know ourselves. Ethics Guide: The Ethics of Classification • Serious problems can arise when classifying people. • What about classifying applicants for college where there are more applicants than positions? • Admissions committee uses a decision-tree datamining program to derive statistically valid measures. No human judgment was involved. • Decision tree analysis might not include important data and results may reinforce social stereotypes. • Results might not be organizationally, legally, or socially feasible. Guide: Semantic Security • Security is a difficult problem Unintended release of protected information Physical security • Protect through passwords and permissions • Delivery system must be secure Semantic security • Unintended release of protected information through release of unprotected reports • Equally serious and more problematic Guide: Semantic Security • Megan is able to combine data in various reports to infer protected information about company employees. • She was not supposed to see this information, but only use reports she was authorized to see. • What, if anything, can be done to prevent what Megan did? Guide: Data Mining in the Real World • Real-world data mining is different from the way it is shown in textbooks because: Data is dirty Values are missing or outside of ranges Time values make no sense You add parameters as you gain knowledge, forcing reprocessing Over fitting data to a model Results based on probabilities, not certainty Seasonality problems • Should you let people think resulting model makes accurate predictions? Active Review Q1 Q2 Q3 Q4 Q5 Q6 Q7 Q8 Why do organizations need business intelligence? What business intelligence systems are available? What are typical reporting applications? What are typical data-mining applications? What is the purpose of data warehouses and data marts? What are typical knowledge-management applications? How are business intelligence applications delivered? 2020? Case Study 9: Business Intelligence for Decision Making at Home Depot • Home depot is a major retail chain specializing in construction and home repair and maintenance products. • Company has 2,200 retail stores worldwide • Generated $71 billion in sales in 2008 • Carries more than 40,000 products in its stores and employs more than 300,000 people • Its stores are visited by more than 22 million people each week. Case Study 9: Business Intelligence for Decision Making at Home Depot • Suppose you are a buyer for the clothes washer and dryer product line at Home Depot. You work with seven different brands and numerous models within each brand. • One of your goals is to turn your inventory as many times a year as you can. In order to do so, you want to identify poorly selling models (and even brands) as quickly as you can. • Risks New model can quickly capture a substantial portion of another model’s market share. Thus, a big seller this year can be a “dog” (a poor seller) next year Geography: Some brands are unavailable in some countries. Within a country some sales trends are national, others are regional. Case Study 9: Business Intelligence for Decision Making at Home Depot • Assume you have total sales data for each brand and model, for each store, for each month. Assume also that you know the store’s city and state. All rights reserved. No part of this publication may be reproduced, stored in a retrieval system, or transmitted, in any form or by any means, electronic, mechanical, photocopying, recording, or otherwise, without the prior written permission of the publisher. Printed in the United States of America. Copyright © 2011 Pearson Education, Inc. Publishing as Prentice Hall