Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
By Narasimhaiah Gorla FEATURES TO CONSIDER IN A DATA WAREHOUSING SYSTEM Evaluating and assessing the important distinctions between data processing capability and data currency. In order for an organization to achieve competitive advantage, voluminous data needs to be managed, analyzed, and fed into the decision-making process. Data warehouses provide decision support to organizations with the help of analytical databases and online analytical processing (OLAP) tools. Incorporating OLAP tools into decision models as part of decision support systems improves decision making [10]. Decision makers can: access analytical databases through an OLAP interface and are able to analyze corporate data on various dimensions; view corporate changes over a period of time, to obtain a macro view of the business operations as well as perform a microanalysis in a specific sub-function; perform various what-if analyses; and drill-down and discover the pattern of sales of certain products in a given period of time or find how the sales performance of an individual salesperson affects the company’s revenues. These time-order/aggregation/disaggregation features provide decision makers valuable insights into the customer/business behavior, which are of fundamental importance for better decision making. OLAP tools have benefited organizations in different ways. For example, Lockheed Martin has used OLAP tools in aircraft design and manufacturing data and cut their analyst labor costs up to 20% [4]. As a result of using data warehouse technology, First American Corporation has transformed itself and improved its financial performance from losses to profits [3]. Data warehouse technology in conjunction with OLAP has been useful in improving decision making in the community health care realm, as shown in [1]. However, despite potential benefits of data warehousing and OLAP tools, such projects were difficult to use and failed to realize benefits [9]. Corporations that invest in data warehouses often do not provide tools to end users that they can use easily, resulting in users not utilizing the tools, millions of dollars worth of unused software, and unrealized return on investment [8]. The most important determinants of new technology acceptance are perceived ease of use (PEU) and perceived usefulness (PU) [6]. PU is defined as the degree to which a person believes that using a particular system COMMUNICATIONS OF THE ACM November 2003/Vol. 46, No. 11 111 would enhance his or her job performance. PEU is defined as the degree to which a person believes that using a particular system would be free of effort. In order to derive benefits from OLAP technology, it is important to assess whether the OLAP tools, as an integral part of data warehousing, help or hinder the usage by the end user. Thus, this article is intended: To find the effect of OLAP features on perceived easy of use (PEU) and the perceived usefulness (PU) of OLAP; to provide suggestions for appropriate contexts for use of ROLAP and MOLAP systems; and to provide guidelines for better design of data warehouses with OLAP technology. Research Methodology Measures of PU and PEU. Data was collected for each feature of OLAP and ease of use/usefulness of OLAP system as perceived by users. Perceived Usefulness is measured based on the potential OLAP benefits [6]: improves decision making, provides accurate analysis, provides all required information, improves working efficiency, and increases user productivity. Perceived Ease of Use is measured based on whether users feel that learning OLAP was easy, the system was user-friendly, OLAP was easy to use, and it was easy to get information. Data Warehouse and OLAP ROLAP Codd et al. first coined the DATA Data term OLAP in 1993 as “the Operational Warehouse OLAP Database Server User Interface dynamic synthesis, analysis, and consolidation of large volROLAP umes of multidimensional DATA data.” OLAP technology can organize data in multidimensional tables called data cubes and provides access to the Measures of OLAP Features. Seven types of Figure 1. Data warehouse and OLAP. tools or features are normally offered in an OLAP data warehouse through an system. Based on previous literature, each feature interactive GUI (see Figure 1). Some of the common capabilities of OLAP include: and its components are described here. Visualization allows users to create summary multidimensionality, aggregation, drill-down and roll-up (view detailed and aggregated data), and slic- tables and charts interactively. This is measured using the presence of multidimensional tables and ing and dicing. The most common types of OLAP technology are graphics. Summarization refers to Multidimensional OLAP “degree of aggregation” of infor(MOLAP) and Relational Visualization mation. We measure this feaOLAP (ROLAP). The -.48 +.39 ture using number of differences between the Summarization +60 hierarchies allowed, level of two concern data process+.39 Ease of detail, and the capability to ing capability and data +.38 Use Navigation swap between summarized and currency [9]. In MOLAP, +79 detailed levels. the data is cleaned, aggreQuery Navigation refers to its capagated in multiple dimenbility to drill-down or drill-up sions, and uploaded into Sophisticated Usefulness Analysis between levels of detail. This is a data cube periodically. measured by shareability (numThe data is stored in mulDimensionality ber of concurrent users tidimensional arrays [2], allowed), data navigatability thus the database has Only significant relationships shown Performance (availability of drill-down, slicprecompiled organization ing-dicing, and drag-drop faciland data arrays that can ities), and ability to extract be accessed directly and detailed and real-time data. faster. In ROLAP, data is aggregated and stored Figure 2. MOLAP model. Query Function: Query along with relational databases. ROLAP relies on indices to be built on tables for data access. Users engines extract data from multidimensional datagenerate queries using SQL on the fly, offering more bases and generate outputs in 3D graphics. This is measured using preconstructed query capability, flexibility in query generation and data currency. 112 November 2003/Vol. 46, No. 11 COMMUNICATIONS OF THE ACM simple query building with click-select feature, or Oracle Express from Oracle Corporation. Powerquery building with query languages, and concur- Play software stores the analytical data in multidirent run of queries. mensional data sets called PowerCubes that are Sophisticated Analysis: This feature is measured by stored either on clients or on servers and are updated six most common types of analyses used in decision periodically by running a batch job. The PowerPlay support: statistical profiling (for example, list cus- analytical engine is aided by Impromptu reporting tomers with highest combined sales); moving aver- system and Visualizer visualization technology. Oraages; cross dimension comparison (compare product cle Express also stores data in multidimensional sales by region over a period of time); queries with “physical cubes” and allows users to “slice and dice” self-defined formula; exception condition; and what- the data cubes. The companies using ROLAP used if analysis. either Business Objects or had Dimensionality is meamodules that were customVisualization sured using the number of developed internally using +.61 allowable dimensions, capabilSQLBase RDBMS. When +.75 Summarization +.62 ity to redefine dimension, and employing Business Objects +.66 time for data refresh after software, a user submits a Ease of Use Navigation +.74 redefinition. request for information Performance includes response through a semantic layer, +.48 times for four basic functions: which is converted to an SQL Query standard report generation, statement submitted to the +.60 customized report generation, database engine that accesses Sophisticated Usefulness Analysis graphic/chart generation, and relational database and returns +.60 data navigation. the result that is transformed Dimensionality Data Collection. To into a cube for the user. +.79 examine the effect of OLAP Performance Only significant relationships shown Results and Discussion features on perceived ease of The significant relationships use (PEU) and perceived usebetween OLAP features and fulness (PU), a questionnaire-based survey was conFigure 3. ROLAP model. PEU and PU for MOLAP ducted in Hong Kong. The questionnaire and ROLAP systems are considered user demographics, measures of PEU and PU, and features of OLAP in place. Users were shown in Figures 2 and 3, respectively. All features queried about their positions, departments, and (except Query function) of ROLAP are perceived as OLAP systems they used. Questions regarding PEU useful. On the contrary, only two features (Visualand PU, for example, “OLAP system increases my ization and Summarization) of MOLAP are perproductivity,” used a five-point Likert scale ranging ceived to be useful. Furthermore, in a ROLAP from 1 (strongly agree) to 5 (strongly disagree). system, PEU is significantly related to PU; thus, Questions regarding OLAP features inquired about when the ROLAP features are perceived as easy to their satisfaction, for example, “Flexibility to swap use and user-friendly, it positively impacted the usebetween summarized and detailed data” (1=strongly fulness of ROLAP. The visualization feature has a positive effect on unsatisfied to 5=very satisfied). Alternatively, the respondents for each feature may choose “not used” ease of use with ROLAP software, while it has a negand “not applicable,” as appropriate. The question- ative effect on ease of use with MOLAP. Visualizanaire was sent to two groups of people—ROLAP tion features are less prevalent in ROLAP, so any and MOLAP users. Seventy-eight questionnaires improvements in visualization with help of graphical were sent to four companies with two of each using user interfaces and help menus aided ease of use in ROLAP or MOLAP systems. Approximately 58 ROLAP. On the other hand, MOLAP systems usuquestionnaires were returned providing a 74% ally have adequate visualization effects. Cognos’ return rate. Pearson correlation analysis was used to MOLAP system, PowerPlay, presents data to users in examine the relationship between OLAP features a variety of modes, such as cross-tabs, pie charts, and graphs using Visualizer technology. In addition, and PEU and PU. Four companies were selected for the survey, two users can change various visualization effects, such of which used MOLAP software and the other two as, colors, formats, fonts, and so forth. It is possible used ROLAP-related software. The companies using that excessive presence of visualization effects in MOLAP used either Cognos Software’s PowerPlay MOLAP could confuse users, resulting in a negative COMMUNICATIONS OF THE ACM November 2003/Vol. 46, No. 11 113 relationship with PEU. The visualization features of ROLAP and MOLAP have positive significant effects on the usefulness of the OLAP tools. The summarization feature has a positive significant relationship with both PEU and PU in ROLAP and MOLAP. This implies that with increasing number of permissible detail-levels and flexibility in swapping between levels, the use of OLAP will improve. The data navigation feature has a significant positive effect on PEU in MOLAP. Since there are only limited levels available for drill-down and slice-dice, MOLAP allowed users to navigate easily. This limitation of MOLAP resulted in a nonsignificant relationship with PU. The situation is the reverse for ROLAP. The Mercury of Business Objects, a ROLAP system, lets the users define their own dimensions, lets them perform queries at various levels of detail, and offers various reporting facilities. Since these flexible navigation facilities (real-time data access, detail data extraction, or drill-down) are possible for ROLAP, this feature has a positive effect on PU. The Query function showed a significant positive relationship only with PEU for MOLAP. Since all reports have been predesigned in MOLAP, users need only to click and select the report. Impromptu, a companion of Cognos’ PowerPlay, is easy-to-use software, but the data cube has to be built by either a database administrator or a database analyst. This predefined data cube may not meet the query needs of a user, in which case, the user needs to wait for the database specialist to modify the data cube. Although MOLAP is easy to use, users did not find it useful because of its lack of flexibility. Sophisticated analysis has a significant positive effect on PU in ROLAP and not on PU in MOLAP. This is because ROLAP provided users with more useful functions: ad hoc queries down to detail data, customized reports, and what-if analysis. The Set Analyzer of Business Objects allows users to build complex queries from large databases as index tables, thereby enabling users to build sophisticated and flexible queries that also run quickly. Set Analyzer allows users to maintain a hierarchy of evolving queries, giving them the capability to perform sophisticated analyses. Dimensionality for ROLAP systems has a significant effect on PU. Since ROLAP systems operate on transactional data, users could get current data in their required dimensions. In MOLAP systems, pre-aggregation has limited the flexibility of changing the definition of dimensions, resulting in users not perceiving it as useful. Oracle Express allowed 114 November 2003/Vol. 46, No. 11 COMMUNICATIONS OF THE ACM users to create relationships among the existing dimensions and to define the top-level dimension. However, the users did not perceive these facilities as relevant or useful. The positive correlation between Performance and PU signifies the importance of system performance in ROLAP. Since it takes time to execute the SQL queries for manipulating voluminous data, users perceived performance to be critical. With ROLAP systems (for example, Business Objects), large amounts of data are queried by the clients against large data sets—this further results in increase in network traffic, leading to high response times of queries. Choice Between MOLAP and ROLAP This study evaluated OLAP tools for ease of using the system and for usefulness. Following are some guidelines in choosing between MOLAP or ROLAP: • Choose MOLAP for non-sophisticated computer users and ROLAP for the sophisticated users. Our study found more features of MOLAP have positive effects on ease of use, compared to those of ROLAP. • Users who use only preset reports and have no need to monitor the daily transaction data could deploy a MOLAP system. On the other hand, users that need to analyze the market information regularly would require a ROLAP system; it is suitable for the retailing industry or manufacturers with a variety of products and a large volume of data. • If the information needs of users are relatively consistent over a period of time, MOLAP is preferred. If the requirements change frequently, ROLAP should be adopted because of its flexible query capability. • Since MOLAP uses a multidimensional data cube that is generated periodically, the data is not current. Hence, MOLAP should be used where data is relatively nonvolatile. Customers can use MOLAP for inquiring about the products, their descriptions, and prices. For a volatile data environment, for example, as in sales transaction data, they would need more current data than is possible through a ROLAP system. • In the initial stages of adoption of OLAP technology in organizations, MOLAP systems are recommended because of their ease of use. After considerable experience, a ROLAP system is preferred because of its flexibility and ability to handle complex queries. Effective OLAP for Data Warehouses Based on the OLAP users’ perception, our findings indicate MOLAP tools make the data warehouse system easy to use but not useful; ROLAP tools make the data warehouse useful but not easy to use. Suggestions for improving the design of data warehouses with OLAP include: Do proper planning : Because the system designs for MOLAP and ROLAP systems are quite different, IT professionals should be aware of this in requirement planning. User requirements for MOLAP systems should be clearly defined in advance so that pre-aggregated formats can be set appropriately. Make ROLAP user-friendly: The flexibility of ROLAP system should be complemented with easyto-use features. Software vendors should design ROLAP tools using better GUI and drag-drop technologies, so that the software is more user-friendly. Align IT strategy with business: OLAP tools should be designed considering alignment of IT strategy with business strategy [7]. First American corporation implemented a data warehouse that is aligned with its business strategy and improved financial performance [3]. By determining information needs based on the proper alignment, OLAP tools can be made more useful for organizations and individuals. This is especially true in case of MOLAP tools, since only a few features are related to PU. Physical data warehouse design: Better physical data warehouse design is needed in order to improve the performance of ROLAP tools. Data warehouses may be designed integrating the ROLAP relational structure and the MOLAP multidimensional cube—one way to implement this is by using a dense-region-based data cube [2]. Performance of data warehouses can also be improved by using physical design techniques, such as partitioning and access method selection [12] and parallel query processing techniques [5]. Personalize : OLAP tools should be personalizable. Personalization is an evolutionary concept in designing personal end-user tools [11]. This may be done by unbundling the features of OLAP and providing the software interface to the user that will allow access to a set of OLAP tools selected depending on the skill level and the information needs of the specific user. This will improve both ease of use and usefulness of the system. Integrate ROLAP and MOLAP : Data warehouses should include both ROLAP and OLAP in an integrated fashion, since information needs generally comprise both batch output and online inquiries. Batch outputs could be done with MOLAP, while online ad hoc needs can be met with ROLAP tools. Integrate OLAP with decision models: In order to make data warehouses and the associated OLAP tools more useful for decision support, analyses need to be made of the decisions to be supported, the decision processes involved, and the relevant decision models. Using decision-making processes and decision models, appropriate queries can be designed and incorporated into OLAP tools, thereby benefiting decision makers. Improve data currency: Since a drawback of MOLAP is not having current data in its database, these data warehouses should be updated as frequently as possible, which will ensure the outputs from the data warehouse are more current. However, updating the data warehouses is time consuming and costly. So, an optimal updating frequency should be computed and used in practice. Use data mining to improve OLAP : Data mining can extract rules based on historical data. By using these rules, the materialized views for OLAP can be designed. Since these rules are extracted from previous transaction profile, the predesigned queries or materialized views in MOLAP tend to be more useful. Furthermore, by using data-mining rules, indexes can be selected intelligently for ROLAP. c References 1. Berndt, D.J., Hevner, A.R., and Studnicki, J. The Catch data warehouse: Support for community health care decision-making. Decision Support Systems 35, 3 (June 2003), 367–384. 2. Cheung, D.W., et al. Towards the building of a dense-region-based OLAP system. Data and Knowledge Engineering 36, (2001), 1–27. 3. Cooper, B.L., et al. Data warehousing supports corporate strategy at First American Corporation. MIS Quarterly 24, 4 (Dec. 2000), 547–567. 4. Cope, J. New tools help Lockheed Martin prepare for takeoff. Computerworld (Mar. 17, 2000). 5. Datta, A., VanderMeer, D., and Ramamritham, K. Parallel star join + DataIndexes: Efficient query processing in data warehouses and OLAP. IEEE Trans. On Knowledge and Data Engineering 14, 6 (Nov./Dec. 2002), 1299–1316. 6. Davis, D.G. Perceived usefulness, perceived ease of use, and user acceptance of information technology. MIS Quarterly, (Sept. 1989), 319–339. 7. Gardner, S.R. Building the data warehouse. Commun. ACM 41, 9 (Sept. 1998), 52–60. 8. Glassey, K. Seducing the end user. Commun. ACM 41, 9 (Sept. 1998), 62–69. 9. Hasan, H. and Hyland, P. Using OLAP and mltidimensional data for decision making. IT Pro, (Sept./Oct. 2001), 44–50. 10. Koutsoukis, N., Mitra, G., and Lucas, C. Adapting on-line analytical processing for decision modelling: The interaction of information and decision technologies. Decision Support Systems 26, (1999), 1–30. 11. Riecken, D. Personal end-user tools. Commun. ACM 43, 8 (Aug. 2000), 89–91. 12. Song, S. and Gorla, N. A transaction-based genetic algorithm approach to vertical fragmentation in relational databases. The Computer Journal, 43, 1 (2000), 81–93. Narasimhaiah Gorla ([email protected]) is an associate professor of IS at Wayne State University in Detroit, MI. © 2003 ACM 0002-0782/03/1100 $5.00 COMMUNICATIONS OF THE ACM November 2003/Vol. 46, No. 11 115