Download Data warehouse as an organizational tool in the Institute for Tourism

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Collaborative decision-making software wikipedia , lookup

Data vault modeling wikipedia , lookup

Business intelligence wikipedia , lookup

Transcript
DATA WAREHOUSE AS AN ORGANIZATIONAL TOOL
IN INSTITUTE FOR TOURISM
%ODåHQND9UGROMDNâDODPRQ
=ULQND0DUXãLü
Institute for Tourism
Zagreb, Croatia
Abstract
As a research institution, the Institute for Tourism (IT) is involved in basically two types
of research projects. IT regularly produces commissioned studies, which generally greatly
vary from one another covering a relatively broad selection of topics and involving the
performance of ad hoc analyses. The Institute also works on the so called continuous
research projects which generally involve the analysis of time series, forecasting and
market research. As of November 1997, the Institute has become a user of the SAS
System software which has met a set of very strictly outlined performance limitations and
conditions: heterogeneous data, differing data sources, impossibility of standardizing data
analysis (use of statistical methods, time series analysis, ad hoc queries and analyses,
multidimensional tables), a minimal staff trained in SAS applications and, last but not the
least, a very limited financial budget. The IT’s goals for the first year are to build a data
warehouse and complete the training in on-line processing thus ensuring efficient support
for the in-house projects. The Institute’s goal in the forthcoming short-term period are
the initiation of a project on the forecasting of tourism flows in Croatia, as well as the
setting up of a Web site allowing the on-line use of tourism related data bases available at
the Institute, thus, in effect, making information another one of the Institute’s projects.
We started with the implementation of Multi Dimensional Data Base (MDDB) as well as
the use of Multi Dimensional Report (MDR) in presenting our time-series data. Our
application gives users the opportunity to select a report among pre-defined reports as
well as the possibility of defining a new one. In addition a segment of the application
using SAS/AF and CSF, allows for a comparative view of 1996/1995 tourist flow figures.
DATA WAREHOUSE AS AN ORGANIZATIONAL TOOL
IN INSTITUTE FOR TOURISM
%ODåHQND9UGROMDNâDODPRQ
=ULQND0DUXãLü
Institute for Tourism
Zagreb, Croatia
Introduction
The Task and Organization of the Institute
The Institute for Tourism was established in 1959 and the experience and orientation of
its experts make it one of the few institutions in Croatia dealing with research in the field
of tourism. The Institute’s grew as the tourist economy in Croatia grew. Monitoring
contemporary tourist development trends the Institute specialized in compiling various
economic studies, market research reports, strategic marketing planning, environmental
interpretations, resource categorization and information systems.
The Institute for Tourism is a scientific research organization and is a part of the
academic community under the auspices of the Ministry of Science and Technology. The
Institute is involved in projects approved and financed by the Ministry. It is also a
Croatian Academic Computing Network (CARnet) member.
The Institute is not large (up to 20 employees) and owing to the very heterogeneous
educational background of its employees it is capable of dealing with multidisciplinary
projects in tourism. It brings together experts of various economic orientation (micro,
macro, marketing), geography, sociology, traffic, architecture, mathematics and computer
sciences.
The Institute’s Field of Activities
The Institute is a project-oriented institution the program of which covers both, longterm, as well as short-term projects.
Long-term projects approved and financed by the Ministry of Science and Technology
are the ones, which provide the largest part of the Institute’s funding. The Ministry of
Tourism also commissions some of the long-term projects, such as strategic marketing
plans of the counties. The opportunity for forecasting of needs for information resources
is an essential feature of such projects.
Some of the researches (such as tourist demand market researches) are carried out in
particular periods only. As far as aspects of data and analysis are concerned their features
are similar to those of long-term projects.
Public and private companies often ask for the Institute’s help and consulting. Along with
the Ministry of Tourism these are the most frequent commissioners of short-term, onetime projects. Such assignments are very important for the Institute since they provide
opportunities for direct contact with the tourist industry and verification of concepts
developed within long-term projects. However, these cover a wide range of problems and
most often require ad-hoc analyses. It often happens that a major part of the time needed
for completing such projects is spent in collecting data and processing thereof.
Available data sources
The Central Bureau of Statistics and Bureau of Payment Mechanisms reports, as well as
statistical and other reviews published by WTO, OECD, EUROSTAT, WTTC as well as
documentation of relevant national statistics centers have been the suppliers of digitized
form data, required for the purpose of project designing. The Institute’s own research
over the last ten years have mainly supplied data for detailed analyses of the Croatian
tourist market. The Institute has thus carried out four research projects dealing with
“Attitudes and Consumption of Foreign and Local Tourists in Croatia”, applying the
same methodology. Digitized Croatian spatial data are also at the disposal of the Institute.
The specialized library of the Institute for Tourism, the fund of which is computer
processed and retrievable, is also a very valuable source of data.
Informational Infrastructure
The Institute has a good informational infrastructure – a local network, and also Internet
access through CARnet. Owing to the Ministry of Science and Technology support, the
same basic software – Windows 95 and MS Office Professional 97 – has been available
to every work station since the beginning of 1998. In addition to this, some of the work
stations are equipped by specialized software, such as SAS (5), GIS (1) and DTP (1). One
NT server is devoted to SAS applications, while the other, Alpha server, is designed to be
used for the Web site.
‘Personal computing’ concept
With the expansion of computer use in data processing, the Institute was oriented towards
the ‘personal computing’ concept. All the employees have been trained in using software
packages, in order to be able to apply them for their work. All the staff members now use
MS Office Professional products. Application of specialized software by some of the
employees has been required by the nature of the Institute’s activities. Therefore, a
computing/informational center does not exist in the customary sense, but three staff
members with a background in computer and information technology have a dual
function within the Institute. On the one hand, they take care of all the aspects of
successful informational infrastructure application and development, as well as of user
training, while on the other hand, they deal, at the same time, with externally
commissioned projects.
Data Processing Methods
Analyzing of processing method and data table presentations applied within projects
show that it is difficult to typify the Institute’s needs. Multidimensional tables including
basic statistical data most often include elementary statistical indicators. In addition to
this, time series analyzes are also used. Although not so often, some more developed
statistical methods were also used, as well as forecast and econometric models.
This briefly outlined background of the Institute, its activities and needs suggest the
following:
• relatively small number of staff
• not many computer and information technology experts
• the computer and information technology experts are able to devote only a small
portion of their attention to problems linked with maintaining, procurement, development
and applying of information technology infrastructure
• modest funding
• orientation towards improved efficiency
• towards improved quality of its data analyses
• towards creation of new business opportunities
The Essential Criterion for Choice of a New Software
The former manner of data and ad hoc requests processing may be described as follows:
either a Dbase program/application was designed or an ad hoc solution was applied
depending on the complexity of the particular request. In order to be finalized the
analysis results were transformed into Excell (percentages and structures were calculated)
and the tables were prepared for printing. One bottle-neck was caused by the fact that
only one staff member performed request processing in Dbase, while many more users
placed their requests and autonomously continued further processing in Excell. Statistical
analyses were processed in a similar manner: SPSS PC+ was used for processing smaller
amounts of data while institutions with a higher level of processing power (SRCE or
DZS) used to be involved in larger-scale processing. Such a method obviously resulted in
more ‘primitive’ data processing and application. However, the most serious shortcoming
of such a method was a poor level of information (inability to solve more complex
requests in good time). The solution looked for needed to eliminate limitations in timely
data accessibility (thus eliminating the bottle-neck) and enable quick and profitable
retrieval of information. Along with the development of the new PC generations,
available processing power in the Institute also increased, thus eliminated the need for
‘outsourcing’ needed to be eliminated.
Looking for the solution for more efficient data application, the Institute contacted three
companies dealing with information technology engineering. Two solutions offered were
using ORACLE products, while the third one was based on the INFORMIX data-base. It
should be noted though, that none of the solutions offered fully met our needs.
The Institute for Tourism and SAS Institute Slovenia organized several meetings in order
to verify the applicability of SAS module to the Institute for Tourism problem.
SAS/STAT statistical module potentials were known, so that the talks were focused on
the module potentials and features included into the ‘Entry Data Warehousing Package’
and ‘Entry Data Mining Option’ package. ‘Data Mining’ and ‘Data Warehousing’ were
presented by experts from SAS Institute, Ljubljana. The decision-making process in the
Institute for Tourism was neither short nor easy. In addition to the fact that after
numerous meetings the talks ‘converged’ towards the conditions acceptable for both
sides, while hiring of new staff experienced in SAS products application had an
additional and positive impact on the Institute’s decision-making process. Thus the
elementary training which would make SAS software application possible was
unnecessary. The Pilot Project started as early as August 1997, in cooperation with SAS
Institute, Slovenia.
SAS Software Purpose
Buying SAS software for ‘Data Warehousing’ and ‘Data Mining’ the Institute for
Tourism had the following aims in mind:
1. implementation of a powerful organizational tool for efficient application of existing
data bases in multi-users conditions;
2. eliminating the need for ‘outsourcing’ in case of survey research (sampling,
processing) and application of advanced statistical processing methods;
3. offering information product by means of a bulletin and also on-line.
The Pilot Project is presented in the text below and is defined so as to achieve the first
aim listed above, that is to gain experience in the implementation thereof. It should be
noted here that the second aim was achieved as soon as the software was bought. In other
words, the research project named ‘Tourist Attitudes and Consumption in Croatia –
TOMAS ‘98’, which has been taking place for the last ten years, was in its entirety
carried out in the Institute, and comprised sampling, allocation, processing and report
presentation.
Pilot Project - Implementation of Data Warehouse Technology, MDDB Server and
MD Reports
Our Data
As mentioned above, our first goal was to implement a new data organization in our
Institute and to switch over from the old dBASE platform. We started with data
warehousing of monthly tourism traffic, our major part of our data. These data include
the number of tourists, as well as the number of tourists’ overnights realized in Croatia in
every accommodation facility, in every town and resort, collected by the Central Bureau
of Statistics of Croatia. The variables we are dealing with are the following:
•
•
•
•
•
•
•
•
year
month
municipality
town or settlement
accommodation facility
tourists country of origin
number of tourists
number of nights
The Goals of the Pilot Project
We have data covering the 1986 - 1997 period. The data set for each individual year
cover approximately 150 000-200 000 observations. For the purpose of the Pilot Project
we only took data for 1995 and 1996. The idea of the Pilot Project was to make a data
warehouse covering the above data and to meet our major everyday requirements such as:
• ad hoc queries
• pre-defined tables (tables usually used in in-house projects, WTO defined tables etc.).
We also want to make our data more powerful as well as to get a greater amount of useful
information out of them.
We are aware that it is almost impossible to present all of our tables and ad-hoc queries
in one application, no matter how large it is, but we do expect the ’SAS solution’ to
process the major part of our everyday requirements. In order to meet these requirements
we decided to create a Multi Dimensional Data Base (MDDB) and apply Multi
Dimensional Reports (MDR).
Implementation of Data Warehouse and Creation of MDDB
Data transformation and integration is one of the first steps in the process of data
warehouse implementation. One of the greatest difficulties of that time series (19861997) was the constant change in the administrative organization of the Republic of
Croatia. When we once organized the territorial units according to the last year of our
data series, the same thing had to be done with different codes for accommodation
facilities over the years. Some changes and data estimates should also have been made
for the countries of origin in order to reflect changes in Europe over the last ten years.
Prior to creating the MDDB, we expanded the existing data with additional information.
We defined a number of new variables, which were very often used in ad-hoc queries as
well as in our in-house projects. We defined the dimensions in our MDDB using all the
variables, the existing, as well as the new, ‘smarter’ ones. The dimensions were as
follows:
• territorial (county – municipality – settlement, islands, sea resorts, highlands, capital,
in-land etc.)
• accommodation (hotels – hotel categories, camps - camp categories, etc.)
• country of origin (primary and secondary markets, ECU countries, OECD etc.)
• time (out of season, pre-season, post-season, holidays)
Application
To organize our data neatly and smartly was the first step. We divided our tables into two
categories: the first one, including the tables of tourism traffic according to some of the
mentioned dimensions and the second one, including comparative tables and tables
designed by WTO. We started with the first group of tables. An example of such a table
is given in Figure I. Once you have your MDDB defined the creation of a table using
MDR is an easy task. Some problems arose when we wanted to expand the tables:
1. format PCTSUM different from SUM,
2. label PCTSUM or SUM.
Other problems encountered were:
3. exporting a table into EXCEL, as well as
4. showing definitions of subsets defined in the report.
Solving the latter would solve the need for ad-hoc queries. At that point, SAS Institute,
Slovenia, helped us a lot by some of the existing methods as well as supporting us in
designing our own methods. As a result, MDR exported to EXCEL via 'Write to HTML'
method is shown in Figure II.
At that point another part of our application was completed. We designed the interface
for our in-house customers using SAS/AF module. The interface consists of several entry
windows in which customers have an opportunity to choose among several predefined
MDRs. The reports are organized in several groups according to the main dimension used
('Reports by Territorial Dimension', 'Reports by Country of Origin', 'Reports by
Accommodation Facility'). Choosing one of the report groups, the list of different reports
is shown and a user may choose one.
We then started with implementation of the second group of the tables. An example of
such a table is shown in Figure III. The problems we encountered here were mostly in the
definition part of a MDR:
1.
2.
3.
4.
show the cumulative percentage of a variable,
sort the report by values of analytical variable,
assign a rank to a specified variable,
show SUM of one variable but PCTSUM of the other.
SAS Institute Slovenia helped us again. The rank of a variable, as well as cumulative
percentage of a variable is solved by defining the computed variable in Metabase. Prior to
that, we had to add two new attributes, RANK and CUMUL, and change the EIS Setup Attributes Search Path. Sorting the report by the values of an analytical variable was
solved by a written method, as well as hiding the column for which some of the statistics
are not wanted. An example of MDR using a part of this new functionality is shown in
Figure IV.
In our future work we have to extend our two-year series to the whole series. We also
have to connect our tourism traffic data with data about accommodation capacity and
make a new part of our data warehouse dealing with exploitation.
Figure I - Example of a pre-defined table.
7RXULVPWUDIILFLQ,675,$DQG35,0256.2*25$16.$FRXQW\
LQ+27(/6E\WRXULVWV
FRXQWU\RIRULJLQ
&RXQWU\RIRULJLQ
$XVWULD
&KHFK5HSXEOLF
*HUPDQ\
,WDO\
727$/
1
1
Figure II Example of MDR exported to EXCEL.
8YMHWL
5HGQLEURMåXSDQLMH
,67$56.$35,0256.2*25$16.$
âLIUDSRGJUXSHVPMHãWDMQLKNDSDFLWHWD
+B2B7B(B/B,BB6B9B(B*B$
*RGLQD
%URMRVWYDUHQLKQRüHQMD
%URMRVWYDUHQLKQRüHQMD
âLIUD]HPOMHSRULMHNOD
%261$B,B+(5&(*29,1$
+59$76.$
0$.('21,-$
6/29(1,-$
$8675$/,-$
$8675,-$
%(/*,-$
%8*$56.$
%-(/2586,-$
.$1$'$
ý(â.$
6/29$ý.$
'$16.$
(6721,-$
),16.$
)5$1&86.$
1-(0$ý.$
6XPD
8GLR
6XPD
8GLR
,56.$
,=5$(/
,7$/,-$
*5ý.$
0$
$56.$
,6/$1'
/,79$
/8.6(0%85*
-$3$1
/(721,-$
1,=2=(06.$
129,B=(/$1'
1259(â.$
32/-6.$
32578*$/
58081-6.$
586.$B)('(5$&,-$
â3$1-2/6.$
â9('6.$
â9,&$56.$
7856.$
6$'
267$/(B(95236.(B=(0/-(
267$/(B,=9$1(95236.(B=(0/-(
8.5$-,1$
9(/,.$B%5,7$1,-$
8.8312
Figure III - Example of a WTO table.
&RXQWLHVUDQNHGE\QXPEHURIRYHUQLJKWVLQ
5$1.
1LJKWV1
&2817<
,QGH[
3HUFHQWDJH
&52$7,$727$/
Figure IV - Example of MDR with RANK and CUMULATIVE defined, exported to
EXCEL.
5DQJQRüHQMD
%URMRVWYDUHQLKQRüHQMD
6XPD
äXSDQLMD
.XPXODWLYQDQRüHQMD
6XPD
6XPD
,67$56.$
35,0256.2*25$16.$
63/,76.2'$/0$7,16.$
'8%529$ý.21(5(79$16.$
=$'$56.$
*5$'B=$*5(%
/,ý.26(1-6.$
â,%(16.2.1,16.$
.5$3,16.2=$*256.$
26-(ý.2%$5$1-6.$
9$5$ä',16.$
=$*5(%$ý.$
98.29$56.265,-(06.$
6,6$ý.2026/$9$ý.$
.235,91,ý.2.5,ä(9$ý.$
0(
%52'6.2326$96.$
.$5/29$ý.$
%-(/29$56.2%,/2*256.$
32ä(â.26/$9216.$
9,529,7,ý.232'5$96.$
,0856.$
Conclusion
Having a clear idea of goals to be achieved, and also limitations which can not be
modified (personnel, funding!) decision-making and procurement of new software for the
Institute for Tourism was a rather carefully considered process. However, such an
approach, ensured a number of advantages. Immediately after the purchase of software,
the Institute’s staff members started the implementation of data warehousing and statistic
analysis of data collected within the TOMAS ’98 project, finalized just before the SAS
software was purchased. Thus, one of the main goals of the project was achieved – data
processing within the Institute itself (outsourcing no longer required).
By finalizing the Pilot Project the first stage of implementing the Data Warehouse was
completed, our ideas were checked and the solutions for adjusting a universal software
package to our particular requirements were carried out. The second stage of our project
started by education of users and expanding the Data Warehouse with data from other
sources, both of the Institute’s and also external. As a lot of money and effort, has been
invested into the Data Warehouse, ‘data mining’ can not be only ‘project driven’. We are
aware that we will only be able to retain the Data Mining we are now applying by
expanding the number of users. Whether this is possible we will have to verify prior to
the date for renewal of the license.