Download Week10 - Information Management and Systems

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Knowledge-Driven
Business Intelligence
Systems: Part I
Week 10
Dr. Jocelyn San Pedro
School of Information Management &
Systems
Monash University
IMS3001 – BUSINESS INTELLIGENCE SYSTEMS – SEM 1 , 2004
Lecture Outline
 Knowledge-Driven BIS
 Knowledge-Driven BIS Technologies
 Data mining
 Data Mining Techniques
IMS3001 – BUSINESS INTELLIGENCE SYSTEMS – SEM 1 , 2004
2
Learning Objectives
At the end of this lecture, the students will



Have better understanding of knowledge-driven
business intelligence systems
Have understanding of some data mining techniques
used in knowledge-driven business intelligence
systems
Have understanding of some data mining
applications
IMS3001 – BUSINESS INTELLIGENCE SYSTEMS – SEM 1 , 2004
3
Knowledge-driven BIS
information systems that provide BI through
access and manipulation of predictive/descriptive
models and/or knowledge bases (containing
expert’s domain knowledge)
 Predictive models – used to forecast explicit values
based on patterns determined from known results
 Descriptive models – describe patterns in existing
data and are generally used to create meaningful
subgroups such as demographic clusters
 Knowledge Base – a collection of organised facts,
rules and procedures
IMS3001 – BUSINESS INTELLIGENCE SYSTEMS – SEM 1 , 2004
4
Predictive models
can provide answers to questions like
 Which products should be promoted to a particular
customer?
 What is the probability that a certain customer will
respond to a planned promotion?
 Which securities will be most profitable to buy or sell
during the next trading session?
 What is the likelihood that a certain customer will
default or pay back on schedule?
 What is the appropriate medical diagnosis for this
patient?
IMS3001 – BUSINESS INTELLIGENCE SYSTEMS – SEM 1 , 2004
5
Descriptive models
Sample demographic clusters/ subgroups
 Men who buy diapers also buy beer
 People who buy scuba gear take Australian vacations
 People who purchase skim milk also tend to buy whole
wheat bread
 Customers who responded to a particular offer are
likely to respond to similar offer
IMS3001 – BUSINESS INTELLIGENCE SYSTEMS – SEM 1 , 2004
6
Knowledge-Driven BIS
Technologies
 Data Mining
 Data Visualisation
Data mining Positioning - http://www.redbooks.ibm.com/redbooks/pdfs/sg245252.pdf
IMS3001 – BUSINESS INTELLIGENCE SYSTEMS – SEM 1 , 2004
7
Data Mining
 Set of activities used to find new, hidden, or
unexpected patterns in the data
 Process of using raw data to infer business
relationships
 Collection of powerful data analysis techniques
intended to assist in analysing extremely large
datasets
Marakas, 2002
 Process of extracting knowledge hidden from large
volumes of raw data
http://www.megaputer.com/dm/dm101.php3
IMS3001 – BUSINESS INTELLIGENCE SYSTEMS – SEM 1 , 2004
8
Data Mining Techniques
Classification – discover rules that define
whether an item or event belongs to a
particular subset or class of data
 Involves building model; then predicting classifications
 e.g. matching buyer attributes with product attributes
predict customers likely to buy a particular product
next month
targeted promotional contact or mailing list
IMS3001 – BUSINESS INTELLIGENCE SYSTEMS – SEM 1 , 2004
9
Predict Classifications - ALICE
d'ISoft
A Credit Officer wishes to identify customers who had
trouble paying back their loans.
# of customers in the
database
N: # and % of customers
who had trouble paying
back loan
Parent Node
Y: # and % of customers
who had no trouble paying
back loan
Graphical chart
representing success rate
Y and failure rate N
IMS3001
– BUSINESS INTELLIGENCE SYSTEMS – SEM 1 , 2004
http://www.alice-soft.com/html/tech_dt.htm
10
Predict Classifications - ALICE
d'ISoft
Split the records according to most discriminating
attribute: housing type
IMS3001
– BUSINESS INTELLIGENCE SYSTEMS – SEM 1 , 2004
http://www.alice-soft.com/html/tech_dt.htm
11
Example Classification Rule: People who rent their home and earn
more than 7853 Francs have an 86% success rate.
IMS3001
– BUSINESS INTELLIGENCE SYSTEMS – SEM 1 , 2004
http://www.alice-soft.com/html/tech_dt.htm
12
Data Mining Techniques
Association – or link analysis – search all details
or transactions from operational systems for
patterns with a high probability of repetition
 Results to development of associative algorithm that
correlates one set of events or items with another set
of events or items
 e.g. of association rules or patterns:
 83% of all records that contain items A, B, C also
contain items D and E
 83% - confidence factor
IMS3001 – BUSINESS INTELLIGENCE SYSTEMS – SEM 1 , 2004
13
Data Mining Techniques
Another example of link analysis:
 Market basket analysis – analysing the products
contained in a purchaser’s basket and then using an
associative rule to compare hundreds of thousands of
baskets
 29% of the time that the brand X blender is sold, the
customer also buys a set of kitchen tumblers
 68% of the time that a customer buys beverages, the
customer also buys pretzels
>Determine the location and content of promotional or
end-of-aisle displays
IMS3001 – BUSINESS INTELLIGENCE SYSTEMS – SEM 1 , 2004
14
Market Basket Analysis
 This is the most widely used and, in many ways, most
successful data mining algorithm.
 It essentially determines what products people
purchase together.
 Stores can use this information to place these
products in the same area.
 Direct marketers can use this information to determine
which new products to offer to their current customers.
 Inventory policies can be improved if reorder points
reflect the demand for the complementary products.
Marakas, –G.M.
(2002) INTELLIGENCE
Decision support
systems–inSEM
the 1
21st
Century. 2nd Ed, Prentice Hall
IMS3001
BUSINESS
SYSTEMS
, 2004
15
Association Rules for Market Basket
Analysis
Rules are written in the form “left-hand side implies righthand side” and an example is:
Yellow Peppers IMPLIES Red Peppers, Bananas,
Bakery
To make effective use of a rule, three numeric measures
about that rule must be considered: (1) support, (2)
confidence and (3) lift
Marakas, –G.M.
(2002) INTELLIGENCE
Decision support
systems–inSEM
the 1
21st
Century. 2nd Ed, Prentice Hall
IMS3001
BUSINESS
SYSTEMS
, 2004
16
Measures of Predictive
Ability
Support refers to the percentage of baskets where the
rule was true (both left and right side products were
present).
LEFT
RIGHT
Confidence measures what percentage of baskets
that contained the left-hand product also contained
the right.
LEFT
RIGHT
Lift measures how much more frequently the left-hand
item is found with the right than without the right.
LEFT
RIGHT
Marakas, –G.M.
(2002) INTELLIGENCE
Decision support
systems–inSEM
the 1
21st
Century. 2nd Ed, Prentice Hall
IMS3001
BUSINESS
SYSTEMS
, 2004
17
An Example
Red
IMPLIES
Bananas
Lift
Support
Green
Peppers
IMPLIES
Bananas
1.37
3.77
1.43
8.58
Yellow
Peppers
IMPLIES
Bananas
1.17
22.12
Confidence
85.96
89.47
73.09
Rule:
 The confidence suggests people buying any kind of
pepper also buy bananas.
 Green peppers sell in about the same quantities as
red or yellow, but are not as predictive.
Marakas, –G.M.
(2002) INTELLIGENCE
Decision support
systems–inSEM
the 1
21st
Century. 2nd Ed, Prentice Hall
IMS3001
BUSINESS
SYSTEMS
, 2004
18
Market Basket Analysis
Methodology
 We first need a list of transactions and what was
purchased. This is pretty easily obtained these days
from scanning cash registers.
 Next, we choose a list of products to analyse, and
tabulate how many times each was purchased with the
others.
 The diagonals of the table shows how often a product
is purchased in any combination, and the off-diagonals
show which combinations were bought.
Marakas, –G.M.
(2002) INTELLIGENCE
Decision support
systems–inSEM
the 1
21st
Century. 2nd Ed, Prentice Hall
IMS3001
BUSINESS
SYSTEMS
, 2004
19
A Convenience Store Example
Consider the following simple example about five
transactions at a convenience store:
Transaction 1:
Transaction 2:
Transaction 3:
Transaction 4:
Transaction 5:
Frozen pizza, cola, milk
Milk, potato chips
Cola, frozen pizza
Milk, pretzels
Cola, pretzels
These need to be cross tabulated and displayed in a
table.
Marakas, –G.M.
(2002) INTELLIGENCE
Decision support
systems–inSEM
the 1
21st
Century. 2nd Ed, Prentice Hall
IMS3001
BUSINESS
SYSTEMS
, 2004
20
A Convenience Store Example
Produc
Bought
Pizza
also
Cola
also
2
1
Chips
also
2
1
Milk
also
1
3
0
1
Pretzel
also
0
1
Pizza
Milk
Cola
Chips
2
0
1
1
3
0
0
1
1
0
Pretzel
0
1
1
0
2
 Pizza and Cola sell together more often than any other
combo; a cross-marketing opportunity?
 Milk sells well with everything – people probably come
here specifically to buy it.
Marakas, –G.M.
(2002) INTELLIGENCE
Decision support
systems–inSEM
the 1
21st
Century. 2nd Ed, Prentice Hall
IMS3001
BUSINESS
SYSTEMS
, 2004
21
Limitations of Market Basket
Analysis
 A large number of real transactions are needed to do
an effective basket analysis, but the data’s accuracy is
compromised if all the products do not occur with
similar frequency.
 The analysis can sometimes capture results that were
due to the success of previous marketing campaigns
(and not natural tendencies of customers).
Marakas, –G.M.
(2002) INTELLIGENCE
Decision support
systems–inSEM
the 1
21st
Century. 2nd Ed, Prentice Hall
IMS3001
BUSINESS
SYSTEMS
, 2004
22
Market Basket Market
Analysis
Basket Analysis in PolyAnalyst
PolyAnalyst
Groups of
products sold
together well
Association
Rules
IMS3001 – BUSINESS INTELLIGENCE SYSTEMS – SEM 1 , 2004
http://www.megaputer.com/products/pa/algorithms/ba.php3
23
HealthCare Fraud Example
Market Basket Analysis + Summary Statistics reveal
providers sharing a large number of patients
>>>Potential Provider Fraud
IMS3001 – BUSINESS INTELLIGENCE SYSTEMS – SEM 1 , 2004
http://www.megaputer.com
24
Data Mining Techniques
Sequencing or time-series analysis – techniques
that relate events in time
 Prediction of interest rate fluctuations or stock
performance based on a series of preceding events
 E.g. buying sequence: parents buy promotional toys
associated with a particular movie within 2 weeks after
renting the movie
>flyer campaign for promotional toys should be linked
to customer lists created a s a results of movie
rentals
 sequence of customer purchases > catalogue of
specific product types can be target-mailed to the
customer
Marakas, –G.M.
(2002) INTELLIGENCE
Decision support
systems–inSEM
the 1
21st
Century. 2nd Ed, Prentice Hall
IMS3001
BUSINESS
SYSTEMS
, 2004
25
Association and Sequencing
Association and sequencing tools analyse data to
discover rules that identify patterns of
behaviour. An association tool will find rules
such as:
 When people buy diapers they also buy beer 50
percent of the time.
A sequencing technique is very similar to an
association technique, but it adds time to the
analysis and produces rules such as:
 People who have purchased a VCR are three times
more likely to purchase a camcorder in the time period
two to four months after the VCR was purchased.
IMS3001
– BUSINESS INTELLIGENCE SYSTEMS – SEM 1 , 2004
http://www.dbmsmag.com/9807m03.html
26
Association and Sequencing
Example in care management, procedure
interactions and pharmaceutical interactions
 Patients who are taking drugs A, B, and C are two and
a half times more likely to also be taking drug D.
 Patients receiving procedure X from Doctor Y are three
times less likely to get infection Z.
IMS3001
– BUSINESS INTELLIGENCE SYSTEMS – SEM 1 , 2004
http://www.dbmsmag.com/9807m03.html
27
Association and Sequencing
Example in financial industry:
 The prices of stocks in industry Q are 1.8 times more
likely to close up one day after stocks in industry R
closed down.
IMS3001
– BUSINESS INTELLIGENCE SYSTEMS – SEM 1 , 2004
http://www.dbmsmag.com/9807m03.html
28
Association and Sequencing
Example in fraud detection in
telecommunications and insurance:
 International credit card calls longer than three
minutes originating in area code 555 between
1:00 AM and 3:00 AM are three times more
likely to go uncollected.
 Accident claims involving soft tissue trauma
where attorney P represents the claimant are
twice as likely to be fraudulent.
IMS3001
– BUSINESS INTELLIGENCE SYSTEMS – SEM 1 , 2004
http://www.dbmsmag.com/9807m03.html
29
Data Mining Techniques
Clustering – technique for creating partitions so
that all members of each set are similar
according to some metric or set of metrics
 e.g., credit card purchase data
 Cluster 1: business-issues gold card, meals
charged on weekdays, mean values greater
than $250
 Cluster 2: personal platinum card, meals
charged on weekends, mean value $175,
bottle of wine charged more than 65% of the
time
Marakas, –G.M.
(2002) INTELLIGENCE
Decision support
systems–inSEM
the 1
21st
Century. 2nd Ed, Prentice Hall
IMS3001
BUSINESS
SYSTEMS
, 2004
30
Clustering- Example
Identifying natural clusters of patient populations
IMS3001 – BUSINESS INTELLIGENCE SYSTEMS – SEM 1 , 2004
http://www.enee.umd.edu/medlab/papers/dcsThShort/thpaper1.html
31
Clustering- Example
Identifying natural clusters of patient populations
IMS3001 – BUSINESS INTELLIGENCE SYSTEMS – SEM 1 , 2004
http://www.enee.umd.edu/medlab/papers/dcsThShort/thpaper1.html
32
Current Limitations and
Challenges to Data Mining
Despite the potential power and value, data mining is still
a new field. Some things that thus far have limited
advancement are:
 Identification of missing information – not all
knowledge gets stored in a database
 Data noise and missing values – future systems
need better ways to handle this
 Large databases and high dimensionality – future
applications need ways to partition data into more
manageable chunks
Marakas, –G.M.
(2002) INTELLIGENCE
Decision support
systems–inSEM
the 1
21st
Century. 2nd Ed, Prentice Hall
IMS3001
BUSINESS
SYSTEMS
, 2004
33
Summary
 Business intelligence systems with data mining tools
allow the systems to find hidden patterns from large
datasets, and use these patterns to turn data into
actionable information
 BIS using data mining tools need data visualisation
tools, to present to the end-user such hidden patterns
 Hidden patterns when placed onto the hands of
decision makers, become actionable information or
business intelligence
IMS3001 – BUSINESS INTELLIGENCE SYSTEMS – SEM 1 , 2004
34
References
Marakas, G.M. (2002) Decision support systems in the
21st Century. 2nd Ed, Prentice Hall (or other editions)
Power, D. (2002) Decision Support Systems: Concepts
and Resources for Managers, Quorum Books.
FREE online resource: Data Mining booklet
http://www.twocrows.com/intro-dm.pdf
IMS3001 – BUSINESS INTELLIGENCE SYSTEMS – SEM 1 , 2004
35
Questions?
[email protected]
School of Information Management and Systems,
Monash University
T1.28, T Block, Caulfield Campus
9903 2735
IMS3001 – BUSINESS INTELLIGENCE SYSTEMS – SEM 1 , 2004
36