Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Knowledge-Driven Business Intelligence Systems: Part I Week 10 Dr. Jocelyn San Pedro School of Information Management & Systems Monash University IMS3001 – BUSINESS INTELLIGENCE SYSTEMS – SEM 1 , 2004 Lecture Outline Knowledge-Driven BIS Knowledge-Driven BIS Technologies Data mining Data Mining Techniques IMS3001 – BUSINESS INTELLIGENCE SYSTEMS – SEM 1 , 2004 2 Learning Objectives At the end of this lecture, the students will Have better understanding of knowledge-driven business intelligence systems Have understanding of some data mining techniques used in knowledge-driven business intelligence systems Have understanding of some data mining applications IMS3001 – BUSINESS INTELLIGENCE SYSTEMS – SEM 1 , 2004 3 Knowledge-driven BIS information systems that provide BI through access and manipulation of predictive/descriptive models and/or knowledge bases (containing expert’s domain knowledge) Predictive models – used to forecast explicit values based on patterns determined from known results Descriptive models – describe patterns in existing data and are generally used to create meaningful subgroups such as demographic clusters Knowledge Base – a collection of organised facts, rules and procedures IMS3001 – BUSINESS INTELLIGENCE SYSTEMS – SEM 1 , 2004 4 Predictive models can provide answers to questions like Which products should be promoted to a particular customer? What is the probability that a certain customer will respond to a planned promotion? Which securities will be most profitable to buy or sell during the next trading session? What is the likelihood that a certain customer will default or pay back on schedule? What is the appropriate medical diagnosis for this patient? IMS3001 – BUSINESS INTELLIGENCE SYSTEMS – SEM 1 , 2004 5 Descriptive models Sample demographic clusters/ subgroups Men who buy diapers also buy beer People who buy scuba gear take Australian vacations People who purchase skim milk also tend to buy whole wheat bread Customers who responded to a particular offer are likely to respond to similar offer IMS3001 – BUSINESS INTELLIGENCE SYSTEMS – SEM 1 , 2004 6 Knowledge-Driven BIS Technologies Data Mining Data Visualisation Data mining Positioning - http://www.redbooks.ibm.com/redbooks/pdfs/sg245252.pdf IMS3001 – BUSINESS INTELLIGENCE SYSTEMS – SEM 1 , 2004 7 Data Mining Set of activities used to find new, hidden, or unexpected patterns in the data Process of using raw data to infer business relationships Collection of powerful data analysis techniques intended to assist in analysing extremely large datasets Marakas, 2002 Process of extracting knowledge hidden from large volumes of raw data http://www.megaputer.com/dm/dm101.php3 IMS3001 – BUSINESS INTELLIGENCE SYSTEMS – SEM 1 , 2004 8 Data Mining Techniques Classification – discover rules that define whether an item or event belongs to a particular subset or class of data Involves building model; then predicting classifications e.g. matching buyer attributes with product attributes predict customers likely to buy a particular product next month targeted promotional contact or mailing list IMS3001 – BUSINESS INTELLIGENCE SYSTEMS – SEM 1 , 2004 9 Predict Classifications - ALICE d'ISoft A Credit Officer wishes to identify customers who had trouble paying back their loans. # of customers in the database N: # and % of customers who had trouble paying back loan Parent Node Y: # and % of customers who had no trouble paying back loan Graphical chart representing success rate Y and failure rate N IMS3001 – BUSINESS INTELLIGENCE SYSTEMS – SEM 1 , 2004 http://www.alice-soft.com/html/tech_dt.htm 10 Predict Classifications - ALICE d'ISoft Split the records according to most discriminating attribute: housing type IMS3001 – BUSINESS INTELLIGENCE SYSTEMS – SEM 1 , 2004 http://www.alice-soft.com/html/tech_dt.htm 11 Example Classification Rule: People who rent their home and earn more than 7853 Francs have an 86% success rate. IMS3001 – BUSINESS INTELLIGENCE SYSTEMS – SEM 1 , 2004 http://www.alice-soft.com/html/tech_dt.htm 12 Data Mining Techniques Association – or link analysis – search all details or transactions from operational systems for patterns with a high probability of repetition Results to development of associative algorithm that correlates one set of events or items with another set of events or items e.g. of association rules or patterns: 83% of all records that contain items A, B, C also contain items D and E 83% - confidence factor IMS3001 – BUSINESS INTELLIGENCE SYSTEMS – SEM 1 , 2004 13 Data Mining Techniques Another example of link analysis: Market basket analysis – analysing the products contained in a purchaser’s basket and then using an associative rule to compare hundreds of thousands of baskets 29% of the time that the brand X blender is sold, the customer also buys a set of kitchen tumblers 68% of the time that a customer buys beverages, the customer also buys pretzels >Determine the location and content of promotional or end-of-aisle displays IMS3001 – BUSINESS INTELLIGENCE SYSTEMS – SEM 1 , 2004 14 Market Basket Analysis This is the most widely used and, in many ways, most successful data mining algorithm. It essentially determines what products people purchase together. Stores can use this information to place these products in the same area. Direct marketers can use this information to determine which new products to offer to their current customers. Inventory policies can be improved if reorder points reflect the demand for the complementary products. Marakas, –G.M. (2002) INTELLIGENCE Decision support systems–inSEM the 1 21st Century. 2nd Ed, Prentice Hall IMS3001 BUSINESS SYSTEMS , 2004 15 Association Rules for Market Basket Analysis Rules are written in the form “left-hand side implies righthand side” and an example is: Yellow Peppers IMPLIES Red Peppers, Bananas, Bakery To make effective use of a rule, three numeric measures about that rule must be considered: (1) support, (2) confidence and (3) lift Marakas, –G.M. (2002) INTELLIGENCE Decision support systems–inSEM the 1 21st Century. 2nd Ed, Prentice Hall IMS3001 BUSINESS SYSTEMS , 2004 16 Measures of Predictive Ability Support refers to the percentage of baskets where the rule was true (both left and right side products were present). LEFT RIGHT Confidence measures what percentage of baskets that contained the left-hand product also contained the right. LEFT RIGHT Lift measures how much more frequently the left-hand item is found with the right than without the right. LEFT RIGHT Marakas, –G.M. (2002) INTELLIGENCE Decision support systems–inSEM the 1 21st Century. 2nd Ed, Prentice Hall IMS3001 BUSINESS SYSTEMS , 2004 17 An Example Red IMPLIES Bananas Lift Support Green Peppers IMPLIES Bananas 1.37 3.77 1.43 8.58 Yellow Peppers IMPLIES Bananas 1.17 22.12 Confidence 85.96 89.47 73.09 Rule: The confidence suggests people buying any kind of pepper also buy bananas. Green peppers sell in about the same quantities as red or yellow, but are not as predictive. Marakas, –G.M. (2002) INTELLIGENCE Decision support systems–inSEM the 1 21st Century. 2nd Ed, Prentice Hall IMS3001 BUSINESS SYSTEMS , 2004 18 Market Basket Analysis Methodology We first need a list of transactions and what was purchased. This is pretty easily obtained these days from scanning cash registers. Next, we choose a list of products to analyse, and tabulate how many times each was purchased with the others. The diagonals of the table shows how often a product is purchased in any combination, and the off-diagonals show which combinations were bought. Marakas, –G.M. (2002) INTELLIGENCE Decision support systems–inSEM the 1 21st Century. 2nd Ed, Prentice Hall IMS3001 BUSINESS SYSTEMS , 2004 19 A Convenience Store Example Consider the following simple example about five transactions at a convenience store: Transaction 1: Transaction 2: Transaction 3: Transaction 4: Transaction 5: Frozen pizza, cola, milk Milk, potato chips Cola, frozen pizza Milk, pretzels Cola, pretzels These need to be cross tabulated and displayed in a table. Marakas, –G.M. (2002) INTELLIGENCE Decision support systems–inSEM the 1 21st Century. 2nd Ed, Prentice Hall IMS3001 BUSINESS SYSTEMS , 2004 20 A Convenience Store Example Produc Bought Pizza also Cola also 2 1 Chips also 2 1 Milk also 1 3 0 1 Pretzel also 0 1 Pizza Milk Cola Chips 2 0 1 1 3 0 0 1 1 0 Pretzel 0 1 1 0 2 Pizza and Cola sell together more often than any other combo; a cross-marketing opportunity? Milk sells well with everything – people probably come here specifically to buy it. Marakas, –G.M. (2002) INTELLIGENCE Decision support systems–inSEM the 1 21st Century. 2nd Ed, Prentice Hall IMS3001 BUSINESS SYSTEMS , 2004 21 Limitations of Market Basket Analysis A large number of real transactions are needed to do an effective basket analysis, but the data’s accuracy is compromised if all the products do not occur with similar frequency. The analysis can sometimes capture results that were due to the success of previous marketing campaigns (and not natural tendencies of customers). Marakas, –G.M. (2002) INTELLIGENCE Decision support systems–inSEM the 1 21st Century. 2nd Ed, Prentice Hall IMS3001 BUSINESS SYSTEMS , 2004 22 Market Basket Market Analysis Basket Analysis in PolyAnalyst PolyAnalyst Groups of products sold together well Association Rules IMS3001 – BUSINESS INTELLIGENCE SYSTEMS – SEM 1 , 2004 http://www.megaputer.com/products/pa/algorithms/ba.php3 23 HealthCare Fraud Example Market Basket Analysis + Summary Statistics reveal providers sharing a large number of patients >>>Potential Provider Fraud IMS3001 – BUSINESS INTELLIGENCE SYSTEMS – SEM 1 , 2004 http://www.megaputer.com 24 Data Mining Techniques Sequencing or time-series analysis – techniques that relate events in time Prediction of interest rate fluctuations or stock performance based on a series of preceding events E.g. buying sequence: parents buy promotional toys associated with a particular movie within 2 weeks after renting the movie >flyer campaign for promotional toys should be linked to customer lists created a s a results of movie rentals sequence of customer purchases > catalogue of specific product types can be target-mailed to the customer Marakas, –G.M. (2002) INTELLIGENCE Decision support systems–inSEM the 1 21st Century. 2nd Ed, Prentice Hall IMS3001 BUSINESS SYSTEMS , 2004 25 Association and Sequencing Association and sequencing tools analyse data to discover rules that identify patterns of behaviour. An association tool will find rules such as: When people buy diapers they also buy beer 50 percent of the time. A sequencing technique is very similar to an association technique, but it adds time to the analysis and produces rules such as: People who have purchased a VCR are three times more likely to purchase a camcorder in the time period two to four months after the VCR was purchased. IMS3001 – BUSINESS INTELLIGENCE SYSTEMS – SEM 1 , 2004 http://www.dbmsmag.com/9807m03.html 26 Association and Sequencing Example in care management, procedure interactions and pharmaceutical interactions Patients who are taking drugs A, B, and C are two and a half times more likely to also be taking drug D. Patients receiving procedure X from Doctor Y are three times less likely to get infection Z. IMS3001 – BUSINESS INTELLIGENCE SYSTEMS – SEM 1 , 2004 http://www.dbmsmag.com/9807m03.html 27 Association and Sequencing Example in financial industry: The prices of stocks in industry Q are 1.8 times more likely to close up one day after stocks in industry R closed down. IMS3001 – BUSINESS INTELLIGENCE SYSTEMS – SEM 1 , 2004 http://www.dbmsmag.com/9807m03.html 28 Association and Sequencing Example in fraud detection in telecommunications and insurance: International credit card calls longer than three minutes originating in area code 555 between 1:00 AM and 3:00 AM are three times more likely to go uncollected. Accident claims involving soft tissue trauma where attorney P represents the claimant are twice as likely to be fraudulent. IMS3001 – BUSINESS INTELLIGENCE SYSTEMS – SEM 1 , 2004 http://www.dbmsmag.com/9807m03.html 29 Data Mining Techniques Clustering – technique for creating partitions so that all members of each set are similar according to some metric or set of metrics e.g., credit card purchase data Cluster 1: business-issues gold card, meals charged on weekdays, mean values greater than $250 Cluster 2: personal platinum card, meals charged on weekends, mean value $175, bottle of wine charged more than 65% of the time Marakas, –G.M. (2002) INTELLIGENCE Decision support systems–inSEM the 1 21st Century. 2nd Ed, Prentice Hall IMS3001 BUSINESS SYSTEMS , 2004 30 Clustering- Example Identifying natural clusters of patient populations IMS3001 – BUSINESS INTELLIGENCE SYSTEMS – SEM 1 , 2004 http://www.enee.umd.edu/medlab/papers/dcsThShort/thpaper1.html 31 Clustering- Example Identifying natural clusters of patient populations IMS3001 – BUSINESS INTELLIGENCE SYSTEMS – SEM 1 , 2004 http://www.enee.umd.edu/medlab/papers/dcsThShort/thpaper1.html 32 Current Limitations and Challenges to Data Mining Despite the potential power and value, data mining is still a new field. Some things that thus far have limited advancement are: Identification of missing information – not all knowledge gets stored in a database Data noise and missing values – future systems need better ways to handle this Large databases and high dimensionality – future applications need ways to partition data into more manageable chunks Marakas, –G.M. (2002) INTELLIGENCE Decision support systems–inSEM the 1 21st Century. 2nd Ed, Prentice Hall IMS3001 BUSINESS SYSTEMS , 2004 33 Summary Business intelligence systems with data mining tools allow the systems to find hidden patterns from large datasets, and use these patterns to turn data into actionable information BIS using data mining tools need data visualisation tools, to present to the end-user such hidden patterns Hidden patterns when placed onto the hands of decision makers, become actionable information or business intelligence IMS3001 – BUSINESS INTELLIGENCE SYSTEMS – SEM 1 , 2004 34 References Marakas, G.M. (2002) Decision support systems in the 21st Century. 2nd Ed, Prentice Hall (or other editions) Power, D. (2002) Decision Support Systems: Concepts and Resources for Managers, Quorum Books. FREE online resource: Data Mining booklet http://www.twocrows.com/intro-dm.pdf IMS3001 – BUSINESS INTELLIGENCE SYSTEMS – SEM 1 , 2004 35 Questions? [email protected] School of Information Management and Systems, Monash University T1.28, T Block, Caulfield Campus 9903 2735 IMS3001 – BUSINESS INTELLIGENCE SYSTEMS – SEM 1 , 2004 36