Download internet technology evolution

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Nonlinear dimensionality reduction wikipedia , lookup

Transcript
The Strategic Utilization of Data
Mining: A Porterian Framework
Chandra S. Amaravadi, Ph.D
Department of Information Management and Decision Sciences
Western Illinois University
D A T A M I N I N G : A N E W T O O L
M A N A G E M E N T S U P P O R T
F O R
In the past decade, a new and exciting technology has unfolded on the shores
of the information systems area. Based on a combination of statistical and artificial
intelligence techniques, data mining has emerged from relational databases and
Online Analytical Processing, as a powerful tool for organizational decision
support (Shim et al. 2002). A number of techniques are available to analyze
warehouse data, including descriptive techniques such as: data summarization, data
visualization, clustering and classification; and pre dictive techniques such as:
regression, association and dependency analyses (Jackson 2002; Mackinnon and
Glick 1999). The technology is being extended to mine semi -structured data as well
(Hui and Jha 2000).
Applications of data mining have ranged from p redicting ingredient usage in
fast food restaurants (Liu, Bhattacharyya, Sclove, Chen and Lattyak 2001) to
predicting the length of stay for hospital patients (Hogl, Muller, Stoyan and
Stuhlinger 2001). See Table 1 for other representative examples. Some of the
important findings are: 1) Bankruptcies can be predicted from variables such as the
“ratio of cash flow to total assets” and “return on assets” (Sung, Chang and Lee
1999), 2) Gas station transactions in the U.K. average ₤20 with a tendency for
customers to round the purchase to the nearest ₤5 (Hand and Blunt 2001), 3) Sales
in fast food restaurants are seasonal with a tendency to peak during holidays and
special events (Liu, Bhattacharyya, Sclove, Chen and Lattyak 2001), 4) Patients in
the age group > 75 are 100% likely to exceed the standard upper limit for hospital
stay (Hogl, Muller, Stoyan and Stuhlinger 2001).
Table 1: Examples of Data Mining Applications






The
Predicting supplies in fast food restaurants (Liu, Bhattacharyya, Sclove,
Chen and Lattyak 2001).
Quality of health care (Hogl, Muller, Stoyan and Stuhlinger 2001).
Analyzing Franchisee sales (Chen, Justis and Chong 2003).
Predicting customer loyalty (Ng and Liu 2001).
Mining credit card data (Hand and Blunt 2001).
Bankruptcy prediction (Sung, Chang and Lee 1999).
DM
D A T A M I N I N G F O R S T R A T E G I C
D E C I S I O N M A K I N G :
A majority of data mining (DM) applicati ons serve a managerial purpose. They
are useful in finding information such as identifying loyal customers or patients
who are likely to stay longer at hospitals. This usage can be extended to strategic
decision making as well. According to Sabherwal an d King (1991), a strategic
application is one that has a profound influence on a firm’s success, by either
influencing or shaping the organization’s strategy or by playing a direct role in the
implementation or support of it. It is this latter idea that is significant for DM, but
first we will consider the process of strategic decision making (SDM). Briefly, the
process involves scanning the environment for relevant information, interpreting it
and formulating a strategy. Organizations differ greatly in t heir approaches to
these stages of strategy making, depending on the type of organization (such as new
vs old) and the degree of change in its environment (i.e. stable vs unstable). In the
scanning stage, some organizations collect data while others rely u pon field
personnel. The interpretation stage could similarly be carried out formally in
meetings or informally with managers discussing findings with one another
(“consensual validation”). In the response stage, some appoint committees to
respond to events while other organizations have standardized responses (Daft and
Weick 1984). The interpretation stage is of particular interest since it involves
modifying the belief systems of the organization. These are the summary of
perceptions, observations, and experiences concerning the organization’s resources,
markets and customers. For instance, an organization might have a perception that
its product lines are aging. Customers switching to competitor’s products could
confirm this observation. There is empiri cal evidence that belief systems influence
strategic decision making (Lorsch 1989). The decision to select a particular
supplier may be influenced by perceptions about the supplier’s reliability. DM
could be utilized in a strategic mode to verify such bel iefs as pertain to the
organization’s customers, suppliers etc. We will alternatively use the term Micro -
Theories (MT) to refer to these beliefs. Each MT will be regarded as a strategic
assumption to be tested by data mining.
The mining process, often labeled as “KDD” (Knowledge Discovery in
Databases) can be “data -driven” or “hypothesis-driven” (“question-driven”). Data
driven methods attempt to identify all possible patterns from the data, while
hypothesis driven methods attempt to verify whether or not a particular pattern
exists (Hogl, Muller, Stoyan and Stuhlinger 2001). Usually, organizations have
more data than they can analyze. Question -driven approaches are computationally
more tractable, especially when large data sets are involved, since the solution
space is bounded. In this mode, KDD commences with a set of MTs that
management is keen on verifying. The remainder of the process is the same for
both approaches (Mackinnon and Glick 1999). The next step is to select suitable
data. This is greatly facilitated if the analyst already has hypotheses to verify.
Otherwise, data selection will involve a an iterative process of selection followed
by testing. The required data needs to be carefully selected from the warehouse or
organizational databases. It is then cleaned and transformed by filling in missing
values, changing “look up codes” (i.e. standardizing codes from numeric values to
text or vice-versa: “1” – married; “2” – single), and ignoring outliers if necessary.
Calculations such as totals, cost/item, discount etc. are also performed during this
stage. The next step is testing and analysis where each MT is examined using the
“selected” and “cleaned” data. The last step is the sharing of results with
management, either through formal report s or presentations or making them
available via an intranet.
A
P O R T E R I A N
F R A M E W O R K
F O R
C H A R A C T E R I Z I N G
S T R A T E G I C
B E L I E F S
What sort of beliefs should an analyst select for testing purposes? Porter’s
framework, developed for the purpose of analyzing the im pact of the environment
on an organization, is widely used by both practitioners and academics to
understand organizational strategy. The framework is useful in organizing micro theories as well since it describes the entities pertaining to the organizatio n’s task
environment, which govern its inputs and outputs and therefore affect its
performance. As shown in figure1, MTs are organized by each of the entities in the
firm’s task environment, including suppliers, customers, competitors and substitute
products. The reader should note that the internal task environment has also been
included, since a firm must consider its internal resources and capabilities when
formulating its strategy.
How should the analyst go about surfacing these
assumptions?
Decision mapping is a suitable technique here. A decision map is a
chart depicting the decision processes in the organization (Ashworth and Goodland
1990). For each of the task areas, the analyst should identify decisions made with a
view to identifying underlying micro-theories. For instance, for the entity
suppliers, decisions faced are: Should a company attempt to consolidate suppliers
or to maintain multiple suppliers? What type of contract should be awarded to a
supplier, short term or long term? How can a contract be optimized in terms of
price/delivery time/lead time?
CUSTOMERS
COMPETITORS
•Strengths in markets
•Strengths in distribution
•Relative price/performance of products
•Service perceptions
•Product perceptions
•Image perceptions
THE FIRM
SUPPLIERS
•Delivery perceptions
•Quality perceptions
•Reliability perceptions
•Employee perceptions
•Management perceptions
SUBSTITUTE PRODUCTS
•Presence/absence of substitutes
•Threat posed by substitutes
Figure 1. A Porterian Framework for Characterizing Micro-theories
The beliefs that can underlie these decisions include:





The
The
The
The
The
supplier
supplier
supplier
supplier
supplier
is reliable.
delivers on time.
has historically offered good pricing/delivery combination.
is flexible in producing products to specifications.
can operate with a small lead time.
Typically, organizations will have hundreds of such beliefs embedded in their SDM
processes. To identify MTs that are relevant, the analyst can prepare a checklist of
all MTs and have senior management select the most important. This list can then
drive the remainder of the KDD process.
T H E
T E S T I N G
P R O C E S S
Once Micro-Theories are identified and data sets are selected/transformed, the next
step in the KDD process is testing. Testing proceeds in two stages, first with “test
data” which is usually 10 -20% of the actual data to develop the model and then
with the remainder of the data to valida te the model. As mentioned, the DM
techniques include clustering, association, classification and dependency analysis.
The MT test list is used by the analyst as a guide in selecting a suitable technique.
For instance, an assumption about the reliability of a supplier could be confirmed
by an association analysis between suppliers, delivery times and the number of
times the specifications were met 100%. It should be noted that the raw data may
not be available in this form, and therefore may require ta bulating and aggregation
especially with respect to the variable, “specifications being met 100%”. If the
association analysis confirms some vendors meeting these criteria, this is again
tested on the remainder of the data in the second stage. A number o f situations
may arise with tested hypotheses: a) the hypothesis is supported in its entirety at
the 90% confidence level or higher, b) the hypothesis is not supported at the 90%
confidence level, but at a lower level of confidence, c) the hypothesis is no t
supported at any confidence level. Situations “a” and “c” are clear cut resulting in
confirmation or disconfirmation of the MT, but “b” and can place the analyst in a
quandary. In such cases, an alternative hypothesis may be sought by modifying the
MT. For instance, an alternative hypothesis for the case above is that delivery times
and specifications may be contingent on the delivery quantities. Thus testing is not
always straight-forward and the strategy may need modification.
C O N C L U S I O N S
A N D
I M P L I C A T I O N S
The strategic usage of data mining technology requires a hypothesis -driven
approach to DM. The hypotheses to be tested are often embedded in the strategic
assumptions of management. Referred to as micro -theories, or beliefs they underlie
and influence critical decisions in an organization. A Porterian framework has
been provided to serve as a guide to surfacing these MTs. Aided by decision
mapping, the analyst should surface such assumptions and test them using the
various techniques of data mining. Typically the results will confirm the MT, but
this may not always be the case. Studies have shown that managers are often too
optimistic or too pessimistic leading to divergence between MTs and conclusions
from KDD. Not all MTs will be testable. Fo r instance, the belief that a supplier is
potentially valuable cannot be tested except through “soft” methods such as
consensual validation. For those that can be tested, data availability can be an issue
especially if the organization/organizational unit is new. In such cases, data can
often be purchased from industry associations. The ultimate result of such efforts is
that executives can make strategic decisions with greater confidence.
R E F E R E N C E S
Ashworth, C. and Goodland, M. (1990). SSADM – a practical approach, McGraw-Hill:
Maidenhead.
Chen, Y-S., Justis, R., and Chong, P. P. (2003). Data mining in franchise organizations,
Book Chapter in Organizational Data Mining edited by Hamid Nemati and Christopher Barko,
Hershey, PA: Idea Group Publishing.
Daft, R. L. and Weick, K. E. (1984). Towards a model of organizations as interpretation
systems, Academy of Management Review, 9(2), 284-295.
Hand D.J., and Blunt, G. (2001). Prospecting for gems in credit card data. IMA Journal of
Management Mathematics, 1 October, 12(2), pp. 173-200.
Hogl, O. J., Muller, M., Stoyan, H., & Stuhlinger, W., (2001). Using questions and interests
to guide data mining for medical quality management. Topics in Health Information
Management. 22(1), 36-50.
Hui , S.C., Jha, G.(2000). Data mining for customer service support, Information and
Management, October, 38(1), 1-13.
Jackson, J. (2002) Data Mining: A Conceptual Overview, Communications of the
Association for Information Systems, 8, 267-296.
Lorsch J.W. (1989). Managing culture: the invisible barrier to strategic change. In A.A.
Thompson and A. J. Strickland (Eds.). Strategy formulation and implementation, (pp. 322-331).
Homewood Illinois: BPI/IRWIN.
Liu, L. M, Bhattacharyya, S., Sclove, S. L., Chen R. and Lattyak, W. J. (2001). Data mining
on time series: an illustration using fast-food restaurant franchise data. Computational statistics
and data analysis. 37, 455-476.
Mackinnon M. J., and Glick, N. (1999). Data mining and knowledge discovery in databases
- an overview, Australian & New Zealand Journal of Statistics, September, 41(3), 255-275.
Ng K., and Liu, H. (2000) Customer retention via data mining, Artificial Intelligence
Review, December, 14(6), 569-590.
Sabherwal, R., & King, W. R. (1991). Towards a theory of strategic use of information
resources. Information and Management, 20(3), 191-212.
Shim J. P., Warkentin, M., Courtney, J. F., Power, D. J., Sharda, R., Carlsson, C. (2002).
Past, Present, and Future of Decision Support technology, Decision Support Systems, June,
33(2), 111-126.
Sung, T.K., Chang, N., and Lee, G. (1999), Dynamics of modeling in data mining:
interpretive approach to bankruptcy prediction, Journal of Management Information Systems,
Summer 16(1), 63-85.
T E R M S
A N D
D E F I N I T I O N S
Association: A technique in data mining that attempts to identify similarities
across a set of records, such as purchases which occur together across a number of
transactions. This is often referred to as “market basket analysis.”
Beliefs: Summaries of perceptions that members in an organization typically share,
such as “Sales are strong in the Southwest.”
Classification: A technique in data mining that attempts to group data according to
pre-specified categories such as “loyal customers” vs “c ustomers likely to switch.”
Clustering: A technique in data mining that attempts to identify the natural
groupings of data, such as income groups that customers belong to.
Data driven: Refers to how the data mining process is carried out. If the data dri ve
the analysis without any prior expectations, the mining process is referred to as a
data driven approach.
Dependency Analysis: This is similar to association analysis. Association analysis
is used to identify items purchased together. Dependency analy sis is used to
identify characteristics which occur together, such as high debt levels being
associated with low savings and low income levels.
Interpretation: The process of understanding the significance of an event such as
an increase in manufacturing orders.
Online Analytical Processing: Performing high-level queries on multi-dimensional
databases.
Question driven: Refers to how the data mining process is carried out. If the
analysis is preceded by an identification of questions of interest, the mining
process is referred to as a question -driven or hypothesis -driven approach.
Micro-Theories: Beliefs that need to be tested during the data mining process.
Multi-Dimensional Databases: A virtual database where data is organized
according to dimensions, or aspects of the data such as product, location and time
for sales data to facilitate queries such as “how many shoes were sold by store#4 in
the month of January.”
Scanning: The process of identifying information relevant to strategic decision
making.
Strategic Decision Making: Refers to an ongoing process of developing organizational strategy
that involves identifying relevant information, interpreting it and arriving at a response.