Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
The Strategic Utilization of Data Mining: A Porterian Framework Chandra S. Amaravadi, Ph.D Department of Information Management and Decision Sciences Western Illinois University D A T A M I N I N G : A N E W T O O L M A N A G E M E N T S U P P O R T F O R In the past decade, a new and exciting technology has unfolded on the shores of the information systems area. Based on a combination of statistical and artificial intelligence techniques, data mining has emerged from relational databases and Online Analytical Processing, as a powerful tool for organizational decision support (Shim et al. 2002). A number of techniques are available to analyze warehouse data, including descriptive techniques such as: data summarization, data visualization, clustering and classification; and pre dictive techniques such as: regression, association and dependency analyses (Jackson 2002; Mackinnon and Glick 1999). The technology is being extended to mine semi -structured data as well (Hui and Jha 2000). Applications of data mining have ranged from p redicting ingredient usage in fast food restaurants (Liu, Bhattacharyya, Sclove, Chen and Lattyak 2001) to predicting the length of stay for hospital patients (Hogl, Muller, Stoyan and Stuhlinger 2001). See Table 1 for other representative examples. Some of the important findings are: 1) Bankruptcies can be predicted from variables such as the “ratio of cash flow to total assets” and “return on assets” (Sung, Chang and Lee 1999), 2) Gas station transactions in the U.K. average ₤20 with a tendency for customers to round the purchase to the nearest ₤5 (Hand and Blunt 2001), 3) Sales in fast food restaurants are seasonal with a tendency to peak during holidays and special events (Liu, Bhattacharyya, Sclove, Chen and Lattyak 2001), 4) Patients in the age group > 75 are 100% likely to exceed the standard upper limit for hospital stay (Hogl, Muller, Stoyan and Stuhlinger 2001). Table 1: Examples of Data Mining Applications The Predicting supplies in fast food restaurants (Liu, Bhattacharyya, Sclove, Chen and Lattyak 2001). Quality of health care (Hogl, Muller, Stoyan and Stuhlinger 2001). Analyzing Franchisee sales (Chen, Justis and Chong 2003). Predicting customer loyalty (Ng and Liu 2001). Mining credit card data (Hand and Blunt 2001). Bankruptcy prediction (Sung, Chang and Lee 1999). DM D A T A M I N I N G F O R S T R A T E G I C D E C I S I O N M A K I N G : A majority of data mining (DM) applicati ons serve a managerial purpose. They are useful in finding information such as identifying loyal customers or patients who are likely to stay longer at hospitals. This usage can be extended to strategic decision making as well. According to Sabherwal an d King (1991), a strategic application is one that has a profound influence on a firm’s success, by either influencing or shaping the organization’s strategy or by playing a direct role in the implementation or support of it. It is this latter idea that is significant for DM, but first we will consider the process of strategic decision making (SDM). Briefly, the process involves scanning the environment for relevant information, interpreting it and formulating a strategy. Organizations differ greatly in t heir approaches to these stages of strategy making, depending on the type of organization (such as new vs old) and the degree of change in its environment (i.e. stable vs unstable). In the scanning stage, some organizations collect data while others rely u pon field personnel. The interpretation stage could similarly be carried out formally in meetings or informally with managers discussing findings with one another (“consensual validation”). In the response stage, some appoint committees to respond to events while other organizations have standardized responses (Daft and Weick 1984). The interpretation stage is of particular interest since it involves modifying the belief systems of the organization. These are the summary of perceptions, observations, and experiences concerning the organization’s resources, markets and customers. For instance, an organization might have a perception that its product lines are aging. Customers switching to competitor’s products could confirm this observation. There is empiri cal evidence that belief systems influence strategic decision making (Lorsch 1989). The decision to select a particular supplier may be influenced by perceptions about the supplier’s reliability. DM could be utilized in a strategic mode to verify such bel iefs as pertain to the organization’s customers, suppliers etc. We will alternatively use the term Micro - Theories (MT) to refer to these beliefs. Each MT will be regarded as a strategic assumption to be tested by data mining. The mining process, often labeled as “KDD” (Knowledge Discovery in Databases) can be “data -driven” or “hypothesis-driven” (“question-driven”). Data driven methods attempt to identify all possible patterns from the data, while hypothesis driven methods attempt to verify whether or not a particular pattern exists (Hogl, Muller, Stoyan and Stuhlinger 2001). Usually, organizations have more data than they can analyze. Question -driven approaches are computationally more tractable, especially when large data sets are involved, since the solution space is bounded. In this mode, KDD commences with a set of MTs that management is keen on verifying. The remainder of the process is the same for both approaches (Mackinnon and Glick 1999). The next step is to select suitable data. This is greatly facilitated if the analyst already has hypotheses to verify. Otherwise, data selection will involve a an iterative process of selection followed by testing. The required data needs to be carefully selected from the warehouse or organizational databases. It is then cleaned and transformed by filling in missing values, changing “look up codes” (i.e. standardizing codes from numeric values to text or vice-versa: “1” – married; “2” – single), and ignoring outliers if necessary. Calculations such as totals, cost/item, discount etc. are also performed during this stage. The next step is testing and analysis where each MT is examined using the “selected” and “cleaned” data. The last step is the sharing of results with management, either through formal report s or presentations or making them available via an intranet. A P O R T E R I A N F R A M E W O R K F O R C H A R A C T E R I Z I N G S T R A T E G I C B E L I E F S What sort of beliefs should an analyst select for testing purposes? Porter’s framework, developed for the purpose of analyzing the im pact of the environment on an organization, is widely used by both practitioners and academics to understand organizational strategy. The framework is useful in organizing micro theories as well since it describes the entities pertaining to the organizatio n’s task environment, which govern its inputs and outputs and therefore affect its performance. As shown in figure1, MTs are organized by each of the entities in the firm’s task environment, including suppliers, customers, competitors and substitute products. The reader should note that the internal task environment has also been included, since a firm must consider its internal resources and capabilities when formulating its strategy. How should the analyst go about surfacing these assumptions? Decision mapping is a suitable technique here. A decision map is a chart depicting the decision processes in the organization (Ashworth and Goodland 1990). For each of the task areas, the analyst should identify decisions made with a view to identifying underlying micro-theories. For instance, for the entity suppliers, decisions faced are: Should a company attempt to consolidate suppliers or to maintain multiple suppliers? What type of contract should be awarded to a supplier, short term or long term? How can a contract be optimized in terms of price/delivery time/lead time? CUSTOMERS COMPETITORS •Strengths in markets •Strengths in distribution •Relative price/performance of products •Service perceptions •Product perceptions •Image perceptions THE FIRM SUPPLIERS •Delivery perceptions •Quality perceptions •Reliability perceptions •Employee perceptions •Management perceptions SUBSTITUTE PRODUCTS •Presence/absence of substitutes •Threat posed by substitutes Figure 1. A Porterian Framework for Characterizing Micro-theories The beliefs that can underlie these decisions include: The The The The The supplier supplier supplier supplier supplier is reliable. delivers on time. has historically offered good pricing/delivery combination. is flexible in producing products to specifications. can operate with a small lead time. Typically, organizations will have hundreds of such beliefs embedded in their SDM processes. To identify MTs that are relevant, the analyst can prepare a checklist of all MTs and have senior management select the most important. This list can then drive the remainder of the KDD process. T H E T E S T I N G P R O C E S S Once Micro-Theories are identified and data sets are selected/transformed, the next step in the KDD process is testing. Testing proceeds in two stages, first with “test data” which is usually 10 -20% of the actual data to develop the model and then with the remainder of the data to valida te the model. As mentioned, the DM techniques include clustering, association, classification and dependency analysis. The MT test list is used by the analyst as a guide in selecting a suitable technique. For instance, an assumption about the reliability of a supplier could be confirmed by an association analysis between suppliers, delivery times and the number of times the specifications were met 100%. It should be noted that the raw data may not be available in this form, and therefore may require ta bulating and aggregation especially with respect to the variable, “specifications being met 100%”. If the association analysis confirms some vendors meeting these criteria, this is again tested on the remainder of the data in the second stage. A number o f situations may arise with tested hypotheses: a) the hypothesis is supported in its entirety at the 90% confidence level or higher, b) the hypothesis is not supported at the 90% confidence level, but at a lower level of confidence, c) the hypothesis is no t supported at any confidence level. Situations “a” and “c” are clear cut resulting in confirmation or disconfirmation of the MT, but “b” and can place the analyst in a quandary. In such cases, an alternative hypothesis may be sought by modifying the MT. For instance, an alternative hypothesis for the case above is that delivery times and specifications may be contingent on the delivery quantities. Thus testing is not always straight-forward and the strategy may need modification. C O N C L U S I O N S A N D I M P L I C A T I O N S The strategic usage of data mining technology requires a hypothesis -driven approach to DM. The hypotheses to be tested are often embedded in the strategic assumptions of management. Referred to as micro -theories, or beliefs they underlie and influence critical decisions in an organization. A Porterian framework has been provided to serve as a guide to surfacing these MTs. Aided by decision mapping, the analyst should surface such assumptions and test them using the various techniques of data mining. Typically the results will confirm the MT, but this may not always be the case. Studies have shown that managers are often too optimistic or too pessimistic leading to divergence between MTs and conclusions from KDD. Not all MTs will be testable. Fo r instance, the belief that a supplier is potentially valuable cannot be tested except through “soft” methods such as consensual validation. For those that can be tested, data availability can be an issue especially if the organization/organizational unit is new. In such cases, data can often be purchased from industry associations. The ultimate result of such efforts is that executives can make strategic decisions with greater confidence. R E F E R E N C E S Ashworth, C. and Goodland, M. (1990). SSADM – a practical approach, McGraw-Hill: Maidenhead. Chen, Y-S., Justis, R., and Chong, P. P. (2003). Data mining in franchise organizations, Book Chapter in Organizational Data Mining edited by Hamid Nemati and Christopher Barko, Hershey, PA: Idea Group Publishing. Daft, R. L. and Weick, K. E. (1984). Towards a model of organizations as interpretation systems, Academy of Management Review, 9(2), 284-295. Hand D.J., and Blunt, G. (2001). Prospecting for gems in credit card data. IMA Journal of Management Mathematics, 1 October, 12(2), pp. 173-200. Hogl, O. J., Muller, M., Stoyan, H., & Stuhlinger, W., (2001). Using questions and interests to guide data mining for medical quality management. Topics in Health Information Management. 22(1), 36-50. Hui , S.C., Jha, G.(2000). Data mining for customer service support, Information and Management, October, 38(1), 1-13. Jackson, J. (2002) Data Mining: A Conceptual Overview, Communications of the Association for Information Systems, 8, 267-296. Lorsch J.W. (1989). Managing culture: the invisible barrier to strategic change. In A.A. Thompson and A. J. Strickland (Eds.). Strategy formulation and implementation, (pp. 322-331). Homewood Illinois: BPI/IRWIN. Liu, L. M, Bhattacharyya, S., Sclove, S. L., Chen R. and Lattyak, W. J. (2001). Data mining on time series: an illustration using fast-food restaurant franchise data. Computational statistics and data analysis. 37, 455-476. Mackinnon M. J., and Glick, N. (1999). Data mining and knowledge discovery in databases - an overview, Australian & New Zealand Journal of Statistics, September, 41(3), 255-275. Ng K., and Liu, H. (2000) Customer retention via data mining, Artificial Intelligence Review, December, 14(6), 569-590. Sabherwal, R., & King, W. R. (1991). Towards a theory of strategic use of information resources. Information and Management, 20(3), 191-212. Shim J. P., Warkentin, M., Courtney, J. F., Power, D. J., Sharda, R., Carlsson, C. (2002). Past, Present, and Future of Decision Support technology, Decision Support Systems, June, 33(2), 111-126. Sung, T.K., Chang, N., and Lee, G. (1999), Dynamics of modeling in data mining: interpretive approach to bankruptcy prediction, Journal of Management Information Systems, Summer 16(1), 63-85. T E R M S A N D D E F I N I T I O N S Association: A technique in data mining that attempts to identify similarities across a set of records, such as purchases which occur together across a number of transactions. This is often referred to as “market basket analysis.” Beliefs: Summaries of perceptions that members in an organization typically share, such as “Sales are strong in the Southwest.” Classification: A technique in data mining that attempts to group data according to pre-specified categories such as “loyal customers” vs “c ustomers likely to switch.” Clustering: A technique in data mining that attempts to identify the natural groupings of data, such as income groups that customers belong to. Data driven: Refers to how the data mining process is carried out. If the data dri ve the analysis without any prior expectations, the mining process is referred to as a data driven approach. Dependency Analysis: This is similar to association analysis. Association analysis is used to identify items purchased together. Dependency analy sis is used to identify characteristics which occur together, such as high debt levels being associated with low savings and low income levels. Interpretation: The process of understanding the significance of an event such as an increase in manufacturing orders. Online Analytical Processing: Performing high-level queries on multi-dimensional databases. Question driven: Refers to how the data mining process is carried out. If the analysis is preceded by an identification of questions of interest, the mining process is referred to as a question -driven or hypothesis -driven approach. Micro-Theories: Beliefs that need to be tested during the data mining process. Multi-Dimensional Databases: A virtual database where data is organized according to dimensions, or aspects of the data such as product, location and time for sales data to facilitate queries such as “how many shoes were sold by store#4 in the month of January.” Scanning: The process of identifying information relevant to strategic decision making. Strategic Decision Making: Refers to an ongoing process of developing organizational strategy that involves identifying relevant information, interpreting it and arriving at a response.