Download BIS

Document related concepts

Nonlinear dimensionality reduction wikipedia , lookup

Transcript
Business Intelligence
Systems
Chapter Preview
This chapter surveys the most common business
intelligence and knowledge-management applications,
discusses the need and purpose for data warehouses,
and explains how business intelligence applications are
delivered to users as business intelligence systems.
Along the way, you’ll learn tools and techniques that
MRV can use to identify the guides that contribute the
most (and least) to its competitive strategy.
We’ll wrap up by discussing some of the potential
benefits and risks of mining credit card data.
Study Questions
Q1
Why do organizations need business
intelligence?
Q2
Q3
Q4
Q5
Q6
Q7
Q8
What business intelligence systems are available?
What are typical reporting applications?
What are typical data-mining applications?
What is the purpose of data warehouses and data marts?
What are typical knowledge-management applications?
How are business intelligence applications delivered?
2020?
Why Do Organizations Need
Business Intelligence?
• Information systems generate enormous amounts
of operational data that contain patterns,
relationships, clusters, and other information that
can facilitate management, especially planning and
forecasting. Business intelligence systems produce
such information from operational data.
• Data communications and data storage are
essentially free, enormous amounts of data are
created and stored every day.
 12,000 gigabytes per person of data, worldwide
in 2009
How Big Is an Exabyte?
(See video)
Study Questions
Q1
Why do organizations need business intelligence?
Q2
What business intelligence systems are
available?
Q3
Q4
Q5
Q6
Q7
Q8
What are typical reporting applications?
What are typical data-mining applications?
What is the purpose of data warehouses and data marts?
What are typical knowledge-management applications?
How are business intelligence applications delivered?
2020?
Business Intelligence (BI) Tools
• BI systems provide valuable information for decision making.
(BI video)
• Three primary BI systems:
1. Reporting Tools
• Integrate data from multiple systems
• Sorting, grouping, summing, averaging, comparing data
2. Data-mining Tools
• Use sophisticated statistical techniques, regression analysis,
and decision tree analysis
• Used to discover hidden patterns and relationships
• Market-basket analysis
Business Intelligence Tools
3. Knowledge-management tool
• Create value by collecting and sharing human
knowledge about products, product uses,
best practices, other critical knowledge
• Used by employees, managers, customers,
suppliers, others who need access to
company knowledge
Tools vs. Applications
vs. Systems
• BI tool is one or more computer programs. BI tools
implement the logic of a particular procedure or
process.
• BI application is the use of a tool on a particular
type of data for a particular purpose.
• BI system is an information system having all five
components that delivers results of a BI application
to users who need those results.
Study Questions
Q1
Q2
Why do organizations need business intelligence?
What business intelligence systems are available?
Q3
What are typical reporting applications?
Q4
Q5
Q6
Q7
Q8
What are typical data-mining applications?
What is the purpose of data warehouses and data marts?
What are typical knowledge-management applications?
How are business intelligence applications delivered?
2020?
Basic Reporting Operations
• Reporting tools produce information from
data using five basic operations:
 Sorting
 Grouping
 Calculating
 Filtering
 Formatting
List of Sales Data
Data Sorted by Customer Name
Sales Data,
Sorted by
Customer Name
and Grouped
by Orders and
Purchase
Amount
Sales Data Filtered to Show
Repeat Customers and Formatted
for Easier Understanding
RFM Analysis
• RFM analysis allows you to analyze and rank
customers according to purchasing patterns as this
figure shows.
 R = how recently a customer purchased your
products
 F = how frequently a customer purchases your
products
 M = how much money a customer typically
spends on your products
RFM Tools Classify Customers?
Divides customers into five groups and assigns a
score from 1 to 5
• R score 1 = top 20 percent in most recent orders
• R score 5 = bottom 20 percent (longest since last
order)
• F score 1 = top 20 percent in most frequent orders
• F score 5 = bottom 20 percent least frequent orders
• M score 1 = top 20 percent in most money spent
• M score 5 = bottom 20 percent in amount of money
spent
Example of RFM Score Data
• Figure 9-6
Interpreting RFM Score Results
• Ajax has ordered recently and orders frequently. M
score of 3 indicates it does not order most
expensive goods.
 A good and regular customer but need to attempt to upsell more expensive goods to Ajax
• Bloominghams has not ordered in some time, but
when it did, ordered frequently, and orders were of
highest monetary value.
 May have taken its business to another vendor. Sales
team should contact this customer immediately.
Interpreting RFM Score Results
• Caruthers has not ordered for some time;
did not order frequently; did not spend
much.
 Sales team should not waste any time on this
customer.
• Davidson in middle
 Set up on automated contact system or use the
Davidson account as a training exercise
Online Analytical Processing
(OLAP)
• OLAP, a second type of reporting tool, is
more generic than RFM.
• OLAP provides the ability to sum, count,
average, and perform other simple
arithmetic operations on groups of data.
• Remarkable characteristic of OLAP reports
is that they are dynamic. The viewer of the
report can change report’s format, hence
the term online.
How Are OLAP Reports Dynamic?
• OLAP reports
 Simple arithmetic operations on data
• Sum, average, count, and so on
 Dynamic
• User can change report structure
• View online
 Measure
• Data item to be manipulated—total sales, average cost
 Dimension
• Characteristic of measure—purchase date, customer
type, location, sales region
OLAP Product Family
and Store Type
OLAP Reports
• OLAP cube
 Presentation of measure with associated
dimensions
 a.k.a. OLAP report
• Users can alter format.
• Users can drill down into data.
 Divide data into more detail
• May require substantial computing power
OLAP Product Family and
Store Location by Store Type
OLAP Product Family and Store
Location by Store Type, Drilled
Down to Show Stores in California
OLAP Servers
• Developed to perform OLAP analysis
• Server reads data from operational
database
• Performs calculations
• Stores results in OLAP database
• Third-party vendors provide software for
more extensive graphical displays.
• Data Warehousing Review
• OLAP services
Role of OLAP Server
and OLAP Database
Study Questions
Q1
Q2
Q3
Why do organizations need business intelligence?
What business intelligence systems are available?
What are typical reporting applications?
Q4
What are typical data-mining applications?
Q5
Q6
Q7
Q8
What is the purpose of data warehouses and data marts?
What are typical knowledge-management applications?
How are business intelligence applications delivered?
2020?
Convergence of Disciplines and
Information Technology
Unsupervised Data Mining
• Analysts do not create model before running
analysis.
• Apply data-mining technique and observe results
• Analysts create hypotheses after analysis to explain
patterns found.
 No prior model about the patterns and
relationships that might exist
• Common statistical technique used:
 Cluster analysis to find groups of similar customers from
customer order and demographic data
Supervised Data Mining
• Model developed before analysis
• Statistical techniques used to estimate
parameters
• Examples:
 Regression analysis—measures impact
of set of variables on one another
 Used for making predictions
Regression Analysis
CellphoneWeekendMinutes = 12 + (17.5 * CustomerAge) +
(23.7 * NumberMonthsOfAccount)
• Using this equation, analysts can predict
number of minutes of weekend cell phone
use by summing 12, plus 17.5 times the
customer’s age, plus 23.7 times the number
of months of the account.
• Considerable skill is required to interpret the
quality of such a model
Neural Networks
Neural networks
• Popular supervised data-mining
technique used to predict values and
make classifications such as “good
prospect” or “poor prospect” customers
• Complicated set of nonlinear equations
• See kdnuggets.com to learn more
Market-Basket Analysis
• Market-basket analysis is a data-mining technique
for determining sales patterns.
 Uses statistical methods to identify sales
patterns in large volumes of data
 Shows which products customers tend to buy
together
 Used to estimate probability of customer
purchase
 Helps identify cross-selling opportunities
• "Customers who bought book X also bought book Y”
Hypothetical Sales Data of 1,000
Items at a Dive Shop
Market-Basket Terminology
• Support
Probability that two items will be bought together
 Fins and masks purchased together 150 times,
thus support for fins and a mask is 150/1,000, or
15 percent
 Support for fins and weights is 60/1,000, or 6
percent
 Support for fins along with a second pair of fins is
10/1,000, or 1 percent
Market-Basket Terminology
• Lift
 Ratio of confidence to base probability of buying
item
 Shows how much base probability increases or
decreases when other products are purchased
• Example:
 Lift of fins and a mask is confidence of fins given
a mask, divided by the base probability of fins.
 Lift of fins and a mask is .5556/.28 = 1.98
Market-Basket Terminology
• Confidence
 What proportion of the customers who bought a mask also
bought fins?
 Conditional probability estimate
• Example:
» Probability of buying fins = 28%
» Probability of buying swim mask = 27%
• After buying fins,
» Probability of buying mask = 150/270 or 55.56%
 Likelihood that a customer will also buy fins almost
doubles, from 28% to 55.56%. Thus, all sales
personnel should try to sell fins to anyone buying a
mask.
Decision Trees
Decision tree
• Hierarchical arrangement of criteria that predict a
classification or value
• Unsupervised data-mining technique
• Basic idea of a decision tree
 Select attributes most useful for classifying
something on some criteria that create disparate
groups
• More different or pure the groups, the
better the classification
Decision Tree
If Senior = Yes
If Junior = Yes
• Figure CE16-3
Decision Tree for Loan Evaluation
• Common business application





Classify loan applications by likelihood of default
Rules identify loans for bank approval
Identify market segment
Structure marketing campaign
Predict problems
Decision Tree Analysis of
MIS Class Grades
• Student’s characteristics
 Class (junior or senior), major, employment, age, club
affiliations, and other characteristics
• Values used to create groups that were as different as possible
on the classification GPA above or below 3.0
• Results
 Best criterion—Class
 Next subdivide Seniors and Juniors into more pure groups
» Seniors—business and non-business majors
» Juniors—restaurant employees and non-restaurant
employees
 Best classifier is whether the junior worked in a restaurant
Create Set of If/Then Decision
Rules
• If student is a junior and works in a restaurant, then
predict grade > 3.0.
• If student is a senior and is a non-business major,
then predict grade < 3.0.
• If student is a junior and does not work in a
restaurant, then predict grade < 3.0.
• If student is a senior and is a business major, then
make no prediction.
A Decision Tree for a Loan
Evaluation
•
•
•
•
Classifying likelihood of default
Examined 3,485 loans
28 percent of those defaulted
Evaluation criteria
A. Percentage of loan past due less than 50 percent =
.94, no default
B. Percentage of loan past due greater than 50
percent = .89, default
• Subdivide groups A and B each into three
classifications: CreditScore, MonthsPastDue, and
CurrentLTV
A Decision Tree for a Loan
Evaluation
Resulting rules
• If the loan is more than half paid, then accept the loan.
• If the loan is less than half paid and
 If CreditScore is greater than 572.6 and
• If CurrentLTV is less than .94, then accept the loan.
• Otherwise, reject the loan.
• Use this analysis to structure a marketing campaign to appeal
to a particular market segment
• Decision trees are easy to understand and easy to implement
using decision rules.
• Some organizations use decision trees to select variables to
be used by other types of data-mining tools.
Credit Score Decision Tree
Figure CE14-4
Study Questions
Q1
Q2
Q3
Q4
Why do organizations need business intelligence?
What business intelligence systems are available?
What are typical reporting applications?
What are typical data-mining applications?
Q5
What is the purpose of data warehouses and
data marts?
Q6
Q7
Q8
What are typical knowledge-management applications?
How are business intelligence applications delivered?
2020?
What Is the Purpose of Data
Warehouses and Data Marts?
• Purpose: (video)
 To extract and clean data from various
operational systems and other sources
 To store and catalog data for BI
processing
 Extract, clean, prepare data
 Stored in data-warehouse DBMS
Components of a Data Warehouse
Data Warehouse Data Sources
• Internal operations systems
• External data purchased from outside
sources
• Data from social networking, user-generated
content applications
• Metadata concerning data stored in datawarehouse meta database
• Clickstream data of customers’ clicking
behavior on a Web site
Example Typical of Customer
Credit Data
Problems with Operational Data
• Dirty data—mistakes in spelling or punctuation,
incorrect data associated with a field, incomplete or
outdated data or even data that is duplicated in the
database.
Examples of Dirty Data
• A value of “B” for customer gender
• 213 for customer age
• Value of 999–999–9999 for a U.S. phone
number
• Part color of “gren”
• mail address of [email protected].
Problems with Operational Data
Too much data causes:
•
Curse of dimensionality
1.
Problem caused by the exponential increase in volume
associated with adding extra dimensions to a
(mathematical) space.
2.
Too many rows or data points
3.
With more attributes, the easier it is to build a model that
fits the sample data but that is worthless as a predictor.
•
Major activities in data mining concerns efficient and
effective ways of selecting attributes.
Data Warehouses vs. Data Marts
Data mart is a collection of data (video)
 Created to address particular needs
• Business function
• Problem
• Opportunity
 Smaller than data warehouse
 Users may not have data management expertise
• Need knowledgeable analysts for specific function
 Data extracted from data warehouse for a
functional area
Components of a Data Mart
Study Questions
Q1
Q2
Q3
Q4
Q5
Why do organizations need business intelligence?
What business intelligence systems are available?
What are typical reporting applications?
What are typical data-mining applications?
What is the purpose of data warehouses and data marts?
Q6
What are typical knowledge management
applications?
Q7
Q8
How are business intelligence applications delivered?
2020?
Knowledge Management (KM)
• The process of creating value from
intellectual capital and sharing that
knowledge with employees, managers,
suppliers, customers, and others who need it.
• Reporting and data mining are used to create
new information from data, knowledgemanagement systems concern the sharing of
knowledge that is known to exist.
Primary Benefits of KM
1. KM fosters innovation by encouraging the free flow of ideas.
2. KM improves customer service by streamlining response time.
3. KM boosts revenues by getting products and services to
market faster.
4. KM enhances employee retention rates by recognizing the
value of employees’ knowledge and rewarding them for it.
5. KM streamlines operations and reduces costs by eliminating
redundant or unnecessary processes.
6. KM preserves organizational memory by capturing and storing
the lessons learned and best practices of key employees.
Sharing of Document Content and
Employee Knowledge
• Sharing Document Content
 Collaboration systems are concerned with
document creation and change
management, KM applications are
concerned with maximizing content use.
Two Typical KnowledgeManagement Applications
Two key technologies for sharing content in KM
systems:
1. Indexing—most important content function in KM
applications that provide easily accessible and
robust means of determining if content exists and
a link to obtain the content. Used in conjunction
with search functions.
Two Typical KnowledgeManagement Applications
RSS (Real Simple Syndication)—a standard for subscribing to
content sources on Web sites. An RSS Reader program helps
users to:
 Subscribe to content sources.
 Periodically check sources for new or updated content through RSS
feeds.
 Place content summaries in an RSS inbox with link to the full
content.
 Think of RSS as an email system for content
 Data source must provide what is termed an RSS feed, which
simply means that the site posts changes according to one of the
RSS standards.
Interface of a Typical RSS Reader
Blog Posts of SharePoint Team
Member
Expert Systems
• Expert systems attempt to capture human
expertise and put it into a format that can be
used by nonexperts.
• Expert systems are rule-based systems that
use IfThen rules similar to those created
by decision-tree analysis, except they are
created from human experts instead of datamining systems.
Problems of Expert Systems
1. Difficult and expensive to develop. They require
many labor hours from both experts in the domain
under study and designers of expert systems.
High opportunity cost of tying up domain experts.
2. Difficult to maintain. Nature of rule-based systems
creates unexpected consequences when adding a
new rule in middle of hundreds of others. A small
change can cause very different outcomes.
3. No expert system has the same diagnostic ability
as knowledgeable, skilled, and experienced
doctors. Rules/actions change frequently.
Expert Systems for Pharmacies
• Used as a safety net to screen decisions of doctors and other
medical professionals. These systems help to achieve
hospital’s goal of state-of-the-art, error-free care.
• DoseChecker, verifies appropriate dosages on prescriptions
issued in the hospital.
• PharmADE, ensures that patients are not prescribed drugs
that have harmful interactions.
• Pharmacy order-entry system invokes these applications as a
prescription is entered. If either system detects a problem with
the prescription, it generates an alert.
Pharmacy Alert
Study Questions
Q1
Q2
Q3
Q4
Q5
Q6
Why do organizations need business intelligence?
What business intelligence systems are available?
What are typical reporting applications?
What are typical data-mining applications?
What is the purpose of data warehouses and data marts?
What are typical knowledge-management applications?
Q7
How are business intelligence applications
delivered?
Q8
2020?
How Are Business Intelligence
Applications Delivered?
What Are the Management
Functions of a BI Server?
• Maintains metadata about authorized allocation of
BI results to users
 Tracks what results are available, what users are
authorized to view those results, and schedule to
provide results to authorized users. Adjusts
allocations as available results change and
users come and go.
BI Servers Vary in Complexity and
Functionality
• Some BI servers are simply Web sites from
which users can download, or pull BI
application results.
• For example, a BI Web server might post
results of an RFM analysis for salespeople
to query to obtain RFM scores for their
customers. Management function for such a
site would simply be to track authorized
users and restrict access.
BI Servers Vary in Complexity and
Functionality
• BI server could operate as a portal server.
BI Portals
• Portals might provide common data such as local
weather, and links to company news, and to BI
application results such as reports on daily sales,
operations, new employees, and results of datamining applications.
• Authorized users are allowed to place reports,
data-mining results, or other BI application results
on their customized pages.
• BI application server pushes the subscribed
results to the user.
Report Server
• A special case of a BI application server
that serves only reports
• BI application servers track results, users,
authorizations, page customizations,
subscriptions, alerts, and data for any other
functionality provided.
What Are the Delivery Functions
of a BI Server?
• Track authorized users
• Track the schedule for providing results to users
• Issue exception alerts that notify users of an exceptional event
• Procedures used depends on the nature of the BI system
• Procedures tend to be more flexible than those in an
operational system because users of a BI system tend to be
engaged in work that is neither structured nor routine
• Procedures are determined by unique requirements of users
• BI results can be delivered to “any” device, such as computers,
PDAs, phones, other applications such as Microsoft Office, and
as a SOA service
Study Questions
Q1
Q2
Q3
Q4
Q5
Q6
Q7
Why do organizations need business intelligence?
What business intelligence systems are available?
What are typical reporting applications?
What are typical data-mining applications?
What is the purpose of data warehouses and data marts?
What are typical knowledge-management applications?
How are business intelligence applications delivered?
Q8
2020?
2020?
• Through data mining, companies, known as “data aggregators”,
will know more about your purchasing psyche than you, your
mother, or your analyst.
• If you use your card to purchase “secondhand clothing, retread
tires, bail bond services, massages, casino gambling or betting”
you alert the credit card company of potential financial
problems and, as a result, it may cancel your card or reduce
your credit limit.
• Absent laws to the contrary, by 2020 your credit card data will
be fully integrated with personal and family data maintained by
the data aggregators (like Acxiom and ChoicePoint).
• By 2020, some online retailers will know a lot more about you,
data aggregators, and most consumer’s purchases than we’ll
know ourselves.
Ethics Guide: The Ethics of
Classification
• Serious problems can arise when classifying
people.
• What about classifying applicants for college where
there are more applicants than positions?
• Admissions committee uses a decision-tree datamining program to derive statistically valid
measures. No human judgment was involved.
• Decision tree analysis might not include important
data and results may reinforce social stereotypes.
• Results might not be organizationally, legally, or
socially feasible.
Guide: Semantic Security
• Security is a difficult problem
 Unintended release of protected information
 Physical security
• Protect through passwords and permissions
• Delivery system must be secure
 Semantic security
• Unintended release of protected information through
release of unprotected reports
• Equally serious and more problematic
Guide: Semantic Security
• Megan is able to combine data in various
reports to infer protected information about
company employees.
• She was not supposed to see this
information, but only use reports she was
authorized to see.
• What, if anything, can be done to prevent
what Megan did?
Guide: Data Mining in the
Real World
• Real-world data mining is different from the way it is
shown in textbooks because:




Data is dirty
Values are missing or outside of ranges
Time values make no sense
You add parameters as you gain knowledge, forcing
reprocessing
 Over fitting data to a model
 Results based on probabilities, not certainty
 Seasonality problems
• Should you let people think resulting model makes
accurate predictions?
Active Review
Q1
Q2
Q3
Q4
Q5
Q6
Q7
Q8
Why do organizations need business intelligence?
What business intelligence systems are available?
What are typical reporting applications?
What are typical data-mining applications?
What is the purpose of data warehouses and data marts?
What are typical knowledge-management applications?
How are business intelligence applications delivered?
2020?
Case Study 9: Business
Intelligence for Decision Making
at Home Depot
• Home depot is a major retail chain specializing in construction
and home repair and maintenance products.
• Company has 2,200 retail stores worldwide
• Generated $71 billion in sales in 2008
• Carries more than 40,000 products in its stores and employs
more than 300,000 people
• Its stores are visited by more than 22 million people each
week.
Case Study 9: Business
Intelligence for Decision Making
at Home Depot
• Suppose you are a buyer for the clothes washer and dryer
product line at Home Depot. You work with seven different
brands and numerous models within each brand.
• One of your goals is to turn your inventory as many times a
year as you can. In order to do so, you want to identify poorly
selling models (and even brands) as quickly as you can.
• Risks
 New model can quickly capture a substantial portion of another
model’s market share. Thus, a big seller this year can be a “dog”
(a poor seller) next year
 Geography: Some brands are unavailable in some countries.
Within a country some sales trends are national, others are
regional.
Case Study 9: Business
Intelligence for Decision Making
at Home Depot
• Assume you have total sales data for each brand
and model, for each store, for each month. Assume
also that you know the store’s city and state.
All rights reserved. No part of this publication may be reproduced, stored in a
retrieval system, or transmitted, in any form or by any means, electronic,
mechanical, photocopying, recording, or otherwise, without the prior written
permission of the publisher. Printed in the United States of America.
Copyright © 2011 Pearson Education, Inc.
Publishing as Prentice Hall