Download features to consider in a data warehousing system

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Business intelligence wikipedia , lookup

Transcript
By Narasimhaiah Gorla
FEATURES TO
CONSIDER IN A DATA
WAREHOUSING SYSTEM
Evaluating
and assessing
the important
distinctions
between data
processing
capability
and data
currency.
In order for an organization to achieve
competitive advantage, voluminous data
needs to be managed, analyzed, and fed
into the decision-making process. Data
warehouses provide decision support to
organizations with the help of analytical
databases and online analytical processing
(OLAP) tools. Incorporating OLAP tools
into decision models as part of decision
support systems improves decision making
[10]. Decision makers can: access analytical databases through an OLAP interface
and are able to analyze corporate data on
various dimensions; view corporate
changes over a period of time, to obtain a
macro view of the business operations as
well as perform a microanalysis in a specific
sub-function; perform various what-if
analyses; and drill-down and discover the
pattern of sales of certain products in a
given period of time or find how the sales
performance of an individual salesperson
affects the company’s revenues.
These time-order/aggregation/disaggregation features provide decision makers
valuable insights into the customer/business behavior, which are of fundamental
importance for better decision making.
OLAP tools have benefited organizations
in different ways. For example, Lockheed
Martin has used OLAP tools in aircraft
design and manufacturing data and cut
their analyst labor costs up to 20% [4]. As
a result of using data warehouse technology, First American Corporation has transformed itself and improved its financial
performance from losses to profits [3].
Data warehouse technology in conjunction
with OLAP has been useful in improving
decision making in the community health
care realm, as shown in [1].
However, despite potential benefits of
data warehousing and OLAP tools, such
projects were difficult to use and failed to
realize benefits [9]. Corporations that
invest in data warehouses often do not
provide tools to end users that they can
use easily, resulting in users not utilizing
the tools, millions of dollars worth of
unused software, and unrealized return on
investment [8]. The most important
determinants of new technology acceptance are perceived ease of use (PEU) and
perceived usefulness (PU) [6]. PU is
defined as the degree to which a person
believes that using a particular system
COMMUNICATIONS OF THE ACM November 2003/Vol. 46, No. 11
111
would enhance his or her job performance. PEU is
defined as the degree to which a person believes that
using a particular system would be free of effort. In
order to derive benefits from OLAP technology, it is
important to assess whether the OLAP tools, as an
integral part of data warehousing, help or hinder the
usage by the end user. Thus, this article is intended: To
find the effect of OLAP features on perceived easy of
use (PEU) and the perceived usefulness (PU) of
OLAP; to provide suggestions for appropriate contexts
for use of ROLAP and MOLAP systems; and to provide guidelines for better design of data warehouses
with OLAP technology.
Research Methodology
Measures of PU and PEU. Data was collected for
each feature of OLAP and ease of use/usefulness of
OLAP system as perceived by users. Perceived Usefulness is measured based on the potential OLAP
benefits [6]: improves decision making, provides
accurate analysis, provides all required information,
improves working efficiency, and increases user productivity. Perceived Ease of Use is measured based
on whether users feel that learning OLAP was easy,
the system was user-friendly, OLAP was easy to use,
and it was easy to get information.
Data Warehouse and
OLAP
ROLAP
Codd et al. first coined the
DATA
Data
term OLAP in 1993 as “the
Operational
Warehouse
OLAP
Database
Server
User Interface
dynamic synthesis, analysis,
and consolidation of large volROLAP
umes of multidimensional
DATA
data.” OLAP technology can
organize data in multidimensional tables called data cubes
and provides access to the
Measures of OLAP Features. Seven types of
Figure 1. Data
warehouse and OLAP. tools or features are normally offered in an OLAP
data warehouse through an
system. Based on previous literature, each feature
interactive GUI (see Figure 1).
Some of the common capabilities of OLAP include: and its components are described here.
Visualization allows users to create summary
multidimensionality, aggregation, drill-down and
roll-up (view detailed and aggregated data), and slic- tables and charts interactively. This is measured
using the presence of multidimensional tables and
ing and dicing.
The most common types of OLAP technology are graphics.
Summarization refers to
Multidimensional OLAP
“degree
of aggregation” of infor(MOLAP) and Relational
Visualization
mation. We measure this feaOLAP (ROLAP). The
-.48
+.39
ture
using
number
of
differences between the
Summarization
+60
hierarchies allowed, level of
two concern data process+.39
Ease of
detail, and the capability to
ing capability and data
+.38
Use
Navigation
swap between summarized and
currency [9]. In MOLAP,
+79
detailed levels.
the data is cleaned, aggreQuery
Navigation refers to its capagated in multiple dimenbility to drill-down or drill-up
sions, and uploaded into
Sophisticated
Usefulness
Analysis
between levels of detail. This is
a data cube periodically.
measured by shareability (numThe data is stored in mulDimensionality
ber of concurrent users
tidimensional arrays [2],
allowed), data navigatability
thus the database has
Only significant relationships shown
Performance
(availability of drill-down, slicprecompiled organization
ing-dicing, and drag-drop faciland data arrays that can
ities), and ability to extract
be accessed directly and
detailed and real-time data.
faster. In ROLAP, data is aggregated and stored Figure 2. MOLAP model.
Query Function: Query
along with relational databases. ROLAP relies on
indices to be built on tables for data access. Users engines extract data from multidimensional datagenerate queries using SQL on the fly, offering more bases and generate outputs in 3D graphics. This is
measured using preconstructed query capability,
flexibility in query generation and data currency.
112
November 2003/Vol. 46, No. 11 COMMUNICATIONS OF THE ACM
simple query building with click-select feature, or Oracle Express from Oracle Corporation. Powerquery building with query languages, and concur- Play software stores the analytical data in multidirent run of queries.
mensional data sets called PowerCubes that are
Sophisticated Analysis: This feature is measured by stored either on clients or on servers and are updated
six most common types of analyses used in decision periodically by running a batch job. The PowerPlay
support: statistical profiling (for example, list cus- analytical engine is aided by Impromptu reporting
tomers with highest combined sales); moving aver- system and Visualizer visualization technology. Oraages; cross dimension comparison (compare product cle Express also stores data in multidimensional
sales by region over a period of time); queries with “physical cubes” and allows users to “slice and dice”
self-defined formula; exception condition; and what- the data cubes. The companies using ROLAP used
if analysis.
either Business Objects or had
Dimensionality is meamodules that were customVisualization
sured using the number of
developed internally using
+.61
allowable dimensions, capabilSQLBase RDBMS. When
+.75
Summarization +.62
ity to redefine dimension, and
employing Business Objects
+.66
time for data refresh after
software, a user submits a
Ease of
Use
Navigation +.74
redefinition.
request for information
Performance includes response
through a semantic layer,
+.48
times for four basic functions:
which is converted to an SQL
Query
standard report generation,
statement submitted to the
+.60
customized report generation,
database engine that accesses
Sophisticated
Usefulness
Analysis
graphic/chart generation, and
relational database and returns
+.60
data navigation.
the result that is transformed
Dimensionality
Data Collection. To
into a cube for the user.
+.79
examine the effect of OLAP
Performance
Only significant relationships shown
Results and Discussion
features on perceived ease of
The significant relationships
use (PEU) and perceived usebetween OLAP features and
fulness (PU), a questionnaire-based survey was conFigure 3. ROLAP model.
PEU and PU for MOLAP
ducted in Hong Kong. The questionnaire
and ROLAP systems are
considered user demographics, measures of PEU and
PU, and features of OLAP in place. Users were shown in Figures 2 and 3, respectively. All features
queried about their positions, departments, and (except Query function) of ROLAP are perceived as
OLAP systems they used. Questions regarding PEU useful. On the contrary, only two features (Visualand PU, for example, “OLAP system increases my ization and Summarization) of MOLAP are perproductivity,” used a five-point Likert scale ranging ceived to be useful. Furthermore, in a ROLAP
from 1 (strongly agree) to 5 (strongly disagree). system, PEU is significantly related to PU; thus,
Questions regarding OLAP features inquired about when the ROLAP features are perceived as easy to
their satisfaction, for example, “Flexibility to swap use and user-friendly, it positively impacted the usebetween summarized and detailed data” (1=strongly fulness of ROLAP.
The visualization feature has a positive effect on
unsatisfied to 5=very satisfied). Alternatively, the
respondents for each feature may choose “not used” ease of use with ROLAP software, while it has a negand “not applicable,” as appropriate. The question- ative effect on ease of use with MOLAP. Visualizanaire was sent to two groups of people—ROLAP tion features are less prevalent in ROLAP, so any
and MOLAP users. Seventy-eight questionnaires improvements in visualization with help of graphical
were sent to four companies with two of each using user interfaces and help menus aided ease of use in
ROLAP or MOLAP systems. Approximately 58 ROLAP. On the other hand, MOLAP systems usuquestionnaires were returned providing a 74% ally have adequate visualization effects. Cognos’
return rate. Pearson correlation analysis was used to MOLAP system, PowerPlay, presents data to users in
examine the relationship between OLAP features a variety of modes, such as cross-tabs, pie charts, and
graphs using Visualizer technology. In addition,
and PEU and PU.
Four companies were selected for the survey, two users can change various visualization effects, such
of which used MOLAP software and the other two as, colors, formats, fonts, and so forth. It is possible
used ROLAP-related software. The companies using that excessive presence of visualization effects in
MOLAP used either Cognos Software’s PowerPlay MOLAP could confuse users, resulting in a negative
COMMUNICATIONS OF THE ACM November 2003/Vol. 46, No. 11
113
relationship with PEU. The visualization features of
ROLAP and MOLAP have positive significant
effects on the usefulness of the OLAP tools.
The summarization feature has a positive significant relationship with both PEU and PU in ROLAP
and MOLAP. This implies that with increasing
number of permissible detail-levels and flexibility in
swapping between levels, the use of OLAP will
improve.
The data navigation feature has a significant positive effect on PEU in MOLAP. Since there are only
limited levels available for drill-down and slice-dice,
MOLAP allowed users to navigate easily. This limitation of MOLAP resulted in a nonsignificant relationship with PU. The situation is the reverse for ROLAP.
The Mercury of Business Objects, a ROLAP system,
lets the users define their own dimensions, lets them
perform queries at various levels of detail, and offers
various reporting facilities. Since these flexible navigation facilities (real-time data access, detail data extraction, or drill-down) are possible for ROLAP, this
feature has a positive effect on PU.
The Query function showed a significant positive relationship only with PEU for MOLAP. Since
all reports have been predesigned in MOLAP, users
need only to click and select the report.
Impromptu, a companion of Cognos’ PowerPlay, is
easy-to-use software, but the data cube has to be
built by either a database administrator or a database analyst. This predefined data cube may not
meet the query needs of a user, in which case, the
user needs to wait for the database specialist to
modify the data cube. Although MOLAP is easy to
use, users did not find it useful because of its lack of
flexibility.
Sophisticated analysis has a significant positive
effect on PU in ROLAP and not on PU in MOLAP.
This is because ROLAP provided users with more
useful functions: ad hoc queries down to detail
data, customized reports, and what-if analysis. The
Set Analyzer of Business Objects allows users to
build complex queries from large databases as index
tables, thereby enabling users to build sophisticated
and flexible queries that also run quickly. Set Analyzer allows users to maintain a hierarchy of evolving queries, giving them the capability to perform
sophisticated analyses.
Dimensionality for ROLAP systems has a significant effect on PU. Since ROLAP systems operate
on transactional data, users could get current data
in their required dimensions. In MOLAP systems,
pre-aggregation has limited the flexibility of changing the definition of dimensions, resulting in users
not perceiving it as useful. Oracle Express allowed
114
November 2003/Vol. 46, No. 11 COMMUNICATIONS OF THE ACM
users to create relationships among the existing
dimensions and to define the top-level dimension.
However, the users did not perceive these facilities
as relevant or useful.
The positive correlation between Performance
and PU signifies the importance of system performance in ROLAP. Since it takes time to execute the
SQL queries for manipulating voluminous data,
users perceived performance to be critical. With
ROLAP systems (for example, Business Objects),
large amounts of data are queried by the clients
against large data sets—this further results in
increase in network traffic, leading to high response
times of queries.
Choice Between MOLAP and ROLAP
This study evaluated OLAP tools for ease of using
the system and for usefulness. Following are some
guidelines in choosing between MOLAP or
ROLAP:
• Choose MOLAP for non-sophisticated computer
users and ROLAP for the sophisticated users.
Our study found more features of MOLAP have
positive effects on ease of use, compared to those
of ROLAP.
• Users who use only preset reports and have no
need to monitor the daily transaction data could
deploy a MOLAP system. On the other hand,
users that need to analyze the market information
regularly would require a ROLAP system; it is
suitable for the retailing industry or manufacturers with a variety of products and a large volume
of data.
• If the information needs of users are relatively
consistent over a period of time, MOLAP is preferred. If the requirements change frequently,
ROLAP should be adopted because of its flexible
query capability.
• Since MOLAP uses a multidimensional data cube
that is generated periodically, the data is not current. Hence, MOLAP should be used where data
is relatively nonvolatile. Customers can use
MOLAP for inquiring about the products, their
descriptions, and prices. For a volatile data environment, for example, as in sales transaction data,
they would need more current data than is possible through a ROLAP system.
• In the initial stages of adoption of OLAP technology in organizations, MOLAP systems are recommended because of their ease of use. After
considerable experience, a ROLAP system is preferred because of its flexibility and ability to handle complex queries.
Effective OLAP for Data Warehouses
Based on the OLAP users’ perception, our findings
indicate MOLAP tools make the data warehouse system easy to use but not useful; ROLAP tools make
the data warehouse useful but not easy to use. Suggestions for improving the design of data warehouses
with OLAP include:
Do proper planning : Because the system designs
for MOLAP and ROLAP systems are quite different,
IT professionals should be aware of this in requirement planning. User requirements for MOLAP systems should be clearly defined in advance so that
pre-aggregated formats can be set appropriately.
Make ROLAP user-friendly: The flexibility of
ROLAP system should be complemented with easyto-use features. Software vendors should design
ROLAP tools using better GUI and drag-drop technologies, so that the software is more user-friendly.
Align IT strategy with business: OLAP tools should
be designed considering alignment of IT strategy
with business strategy [7]. First American corporation implemented a data warehouse that is aligned
with its business strategy and improved financial performance [3]. By determining information needs
based on the proper alignment, OLAP tools can be
made more useful for organizations and individuals.
This is especially true in case of MOLAP tools, since
only a few features are related to PU.
Physical data warehouse design: Better physical
data warehouse design is needed in order to improve
the performance of ROLAP tools. Data warehouses
may be designed integrating the ROLAP relational
structure and the MOLAP multidimensional
cube—one way to implement this is by using a
dense-region-based data cube [2]. Performance of
data warehouses can also be improved by using physical design techniques, such as partitioning and
access method selection [12] and parallel query processing techniques [5].
Personalize : OLAP tools should be personalizable. Personalization is an evolutionary concept in
designing personal end-user tools [11]. This may be
done by unbundling the features of OLAP and providing the software interface to the user that will
allow access to a set of OLAP tools selected depending on the skill level and the information needs of
the specific user. This will improve both ease of use
and usefulness of the system.
Integrate ROLAP and MOLAP : Data warehouses
should include both ROLAP and OLAP in an integrated fashion, since information needs generally
comprise both batch output and online inquiries.
Batch outputs could be done with MOLAP, while
online ad hoc needs can be met with ROLAP tools.
Integrate OLAP with decision models: In order to
make data warehouses and the associated OLAP
tools more useful for decision support, analyses need
to be made of the decisions to be supported, the
decision processes involved, and the relevant decision models. Using decision-making processes and
decision models, appropriate queries can be designed
and incorporated into OLAP tools, thereby benefiting decision makers.
Improve data currency: Since a drawback of
MOLAP is not having current data in its database,
these data warehouses should be updated as frequently as possible, which will ensure the outputs
from the data warehouse are more current. However,
updating the data warehouses is time consuming and
costly. So, an optimal updating frequency should be
computed and used in practice.
Use data mining to improve OLAP : Data mining
can extract rules based on historical data. By using
these rules, the materialized views for OLAP can be
designed. Since these rules are extracted from previous transaction profile, the predesigned queries or
materialized views in MOLAP tend to be more useful. Furthermore, by using data-mining rules,
indexes can be selected intelligently for ROLAP. c
References
1. Berndt, D.J., Hevner, A.R., and Studnicki, J. The Catch data warehouse: Support for community health care decision-making. Decision
Support Systems 35, 3 (June 2003), 367–384.
2. Cheung, D.W., et al. Towards the building of a dense-region-based
OLAP system. Data and Knowledge Engineering 36, (2001), 1–27.
3. Cooper, B.L., et al. Data warehousing supports corporate strategy at
First American Corporation. MIS Quarterly 24, 4 (Dec. 2000),
547–567.
4. Cope, J. New tools help Lockheed Martin prepare for takeoff. Computerworld (Mar. 17, 2000).
5. Datta, A., VanderMeer, D., and Ramamritham, K. Parallel star join +
DataIndexes: Efficient query processing in data warehouses and OLAP.
IEEE Trans. On Knowledge and Data Engineering 14, 6 (Nov./Dec.
2002), 1299–1316.
6. Davis, D.G. Perceived usefulness, perceived ease of use, and user acceptance of information technology. MIS Quarterly, (Sept. 1989), 319–339.
7. Gardner, S.R. Building the data warehouse. Commun. ACM 41, 9
(Sept. 1998), 52–60.
8. Glassey, K. Seducing the end user. Commun. ACM 41, 9 (Sept. 1998),
62–69.
9. Hasan, H. and Hyland, P. Using OLAP and mltidimensional data for
decision making. IT Pro, (Sept./Oct. 2001), 44–50.
10. Koutsoukis, N., Mitra, G., and Lucas, C. Adapting on-line analytical
processing for decision modelling: The interaction of information and
decision technologies. Decision Support Systems 26, (1999), 1–30.
11. Riecken, D. Personal end-user tools. Commun. ACM 43, 8 (Aug.
2000), 89–91.
12. Song, S. and Gorla, N. A transaction-based genetic algorithm approach
to vertical fragmentation in relational databases. The Computer Journal,
43, 1 (2000), 81–93.
Narasimhaiah Gorla ([email protected]) is an associate
professor of IS at Wayne State University in Detroit, MI.
© 2003 ACM 0002-0782/03/1100 $5.00
COMMUNICATIONS OF THE ACM November 2003/Vol. 46, No. 11
115