Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
DATA WAREHOUSE AS AN ORGANIZATIONAL TOOL IN INSTITUTE FOR TOURISM %ODåHQND9UGROMDNâDODPRQ =ULQND0DUXãLü Institute for Tourism Zagreb, Croatia Abstract As a research institution, the Institute for Tourism (IT) is involved in basically two types of research projects. IT regularly produces commissioned studies, which generally greatly vary from one another covering a relatively broad selection of topics and involving the performance of ad hoc analyses. The Institute also works on the so called continuous research projects which generally involve the analysis of time series, forecasting and market research. As of November 1997, the Institute has become a user of the SAS System software which has met a set of very strictly outlined performance limitations and conditions: heterogeneous data, differing data sources, impossibility of standardizing data analysis (use of statistical methods, time series analysis, ad hoc queries and analyses, multidimensional tables), a minimal staff trained in SAS applications and, last but not the least, a very limited financial budget. The IT’s goals for the first year are to build a data warehouse and complete the training in on-line processing thus ensuring efficient support for the in-house projects. The Institute’s goal in the forthcoming short-term period are the initiation of a project on the forecasting of tourism flows in Croatia, as well as the setting up of a Web site allowing the on-line use of tourism related data bases available at the Institute, thus, in effect, making information another one of the Institute’s projects. We started with the implementation of Multi Dimensional Data Base (MDDB) as well as the use of Multi Dimensional Report (MDR) in presenting our time-series data. Our application gives users the opportunity to select a report among pre-defined reports as well as the possibility of defining a new one. In addition a segment of the application using SAS/AF and CSF, allows for a comparative view of 1996/1995 tourist flow figures. DATA WAREHOUSE AS AN ORGANIZATIONAL TOOL IN INSTITUTE FOR TOURISM %ODåHQND9UGROMDNâDODPRQ =ULQND0DUXãLü Institute for Tourism Zagreb, Croatia Introduction The Task and Organization of the Institute The Institute for Tourism was established in 1959 and the experience and orientation of its experts make it one of the few institutions in Croatia dealing with research in the field of tourism. The Institute’s grew as the tourist economy in Croatia grew. Monitoring contemporary tourist development trends the Institute specialized in compiling various economic studies, market research reports, strategic marketing planning, environmental interpretations, resource categorization and information systems. The Institute for Tourism is a scientific research organization and is a part of the academic community under the auspices of the Ministry of Science and Technology. The Institute is involved in projects approved and financed by the Ministry. It is also a Croatian Academic Computing Network (CARnet) member. The Institute is not large (up to 20 employees) and owing to the very heterogeneous educational background of its employees it is capable of dealing with multidisciplinary projects in tourism. It brings together experts of various economic orientation (micro, macro, marketing), geography, sociology, traffic, architecture, mathematics and computer sciences. The Institute’s Field of Activities The Institute is a project-oriented institution the program of which covers both, longterm, as well as short-term projects. Long-term projects approved and financed by the Ministry of Science and Technology are the ones, which provide the largest part of the Institute’s funding. The Ministry of Tourism also commissions some of the long-term projects, such as strategic marketing plans of the counties. The opportunity for forecasting of needs for information resources is an essential feature of such projects. Some of the researches (such as tourist demand market researches) are carried out in particular periods only. As far as aspects of data and analysis are concerned their features are similar to those of long-term projects. Public and private companies often ask for the Institute’s help and consulting. Along with the Ministry of Tourism these are the most frequent commissioners of short-term, onetime projects. Such assignments are very important for the Institute since they provide opportunities for direct contact with the tourist industry and verification of concepts developed within long-term projects. However, these cover a wide range of problems and most often require ad-hoc analyses. It often happens that a major part of the time needed for completing such projects is spent in collecting data and processing thereof. Available data sources The Central Bureau of Statistics and Bureau of Payment Mechanisms reports, as well as statistical and other reviews published by WTO, OECD, EUROSTAT, WTTC as well as documentation of relevant national statistics centers have been the suppliers of digitized form data, required for the purpose of project designing. The Institute’s own research over the last ten years have mainly supplied data for detailed analyses of the Croatian tourist market. The Institute has thus carried out four research projects dealing with “Attitudes and Consumption of Foreign and Local Tourists in Croatia”, applying the same methodology. Digitized Croatian spatial data are also at the disposal of the Institute. The specialized library of the Institute for Tourism, the fund of which is computer processed and retrievable, is also a very valuable source of data. Informational Infrastructure The Institute has a good informational infrastructure – a local network, and also Internet access through CARnet. Owing to the Ministry of Science and Technology support, the same basic software – Windows 95 and MS Office Professional 97 – has been available to every work station since the beginning of 1998. In addition to this, some of the work stations are equipped by specialized software, such as SAS (5), GIS (1) and DTP (1). One NT server is devoted to SAS applications, while the other, Alpha server, is designed to be used for the Web site. ‘Personal computing’ concept With the expansion of computer use in data processing, the Institute was oriented towards the ‘personal computing’ concept. All the employees have been trained in using software packages, in order to be able to apply them for their work. All the staff members now use MS Office Professional products. Application of specialized software by some of the employees has been required by the nature of the Institute’s activities. Therefore, a computing/informational center does not exist in the customary sense, but three staff members with a background in computer and information technology have a dual function within the Institute. On the one hand, they take care of all the aspects of successful informational infrastructure application and development, as well as of user training, while on the other hand, they deal, at the same time, with externally commissioned projects. Data Processing Methods Analyzing of processing method and data table presentations applied within projects show that it is difficult to typify the Institute’s needs. Multidimensional tables including basic statistical data most often include elementary statistical indicators. In addition to this, time series analyzes are also used. Although not so often, some more developed statistical methods were also used, as well as forecast and econometric models. This briefly outlined background of the Institute, its activities and needs suggest the following: • relatively small number of staff • not many computer and information technology experts • the computer and information technology experts are able to devote only a small portion of their attention to problems linked with maintaining, procurement, development and applying of information technology infrastructure • modest funding • orientation towards improved efficiency • towards improved quality of its data analyses • towards creation of new business opportunities The Essential Criterion for Choice of a New Software The former manner of data and ad hoc requests processing may be described as follows: either a Dbase program/application was designed or an ad hoc solution was applied depending on the complexity of the particular request. In order to be finalized the analysis results were transformed into Excell (percentages and structures were calculated) and the tables were prepared for printing. One bottle-neck was caused by the fact that only one staff member performed request processing in Dbase, while many more users placed their requests and autonomously continued further processing in Excell. Statistical analyses were processed in a similar manner: SPSS PC+ was used for processing smaller amounts of data while institutions with a higher level of processing power (SRCE or DZS) used to be involved in larger-scale processing. Such a method obviously resulted in more ‘primitive’ data processing and application. However, the most serious shortcoming of such a method was a poor level of information (inability to solve more complex requests in good time). The solution looked for needed to eliminate limitations in timely data accessibility (thus eliminating the bottle-neck) and enable quick and profitable retrieval of information. Along with the development of the new PC generations, available processing power in the Institute also increased, thus eliminated the need for ‘outsourcing’ needed to be eliminated. Looking for the solution for more efficient data application, the Institute contacted three companies dealing with information technology engineering. Two solutions offered were using ORACLE products, while the third one was based on the INFORMIX data-base. It should be noted though, that none of the solutions offered fully met our needs. The Institute for Tourism and SAS Institute Slovenia organized several meetings in order to verify the applicability of SAS module to the Institute for Tourism problem. SAS/STAT statistical module potentials were known, so that the talks were focused on the module potentials and features included into the ‘Entry Data Warehousing Package’ and ‘Entry Data Mining Option’ package. ‘Data Mining’ and ‘Data Warehousing’ were presented by experts from SAS Institute, Ljubljana. The decision-making process in the Institute for Tourism was neither short nor easy. In addition to the fact that after numerous meetings the talks ‘converged’ towards the conditions acceptable for both sides, while hiring of new staff experienced in SAS products application had an additional and positive impact on the Institute’s decision-making process. Thus the elementary training which would make SAS software application possible was unnecessary. The Pilot Project started as early as August 1997, in cooperation with SAS Institute, Slovenia. SAS Software Purpose Buying SAS software for ‘Data Warehousing’ and ‘Data Mining’ the Institute for Tourism had the following aims in mind: 1. implementation of a powerful organizational tool for efficient application of existing data bases in multi-users conditions; 2. eliminating the need for ‘outsourcing’ in case of survey research (sampling, processing) and application of advanced statistical processing methods; 3. offering information product by means of a bulletin and also on-line. The Pilot Project is presented in the text below and is defined so as to achieve the first aim listed above, that is to gain experience in the implementation thereof. It should be noted here that the second aim was achieved as soon as the software was bought. In other words, the research project named ‘Tourist Attitudes and Consumption in Croatia – TOMAS ‘98’, which has been taking place for the last ten years, was in its entirety carried out in the Institute, and comprised sampling, allocation, processing and report presentation. Pilot Project - Implementation of Data Warehouse Technology, MDDB Server and MD Reports Our Data As mentioned above, our first goal was to implement a new data organization in our Institute and to switch over from the old dBASE platform. We started with data warehousing of monthly tourism traffic, our major part of our data. These data include the number of tourists, as well as the number of tourists’ overnights realized in Croatia in every accommodation facility, in every town and resort, collected by the Central Bureau of Statistics of Croatia. The variables we are dealing with are the following: • • • • • • • • year month municipality town or settlement accommodation facility tourists country of origin number of tourists number of nights The Goals of the Pilot Project We have data covering the 1986 - 1997 period. The data set for each individual year cover approximately 150 000-200 000 observations. For the purpose of the Pilot Project we only took data for 1995 and 1996. The idea of the Pilot Project was to make a data warehouse covering the above data and to meet our major everyday requirements such as: • ad hoc queries • pre-defined tables (tables usually used in in-house projects, WTO defined tables etc.). We also want to make our data more powerful as well as to get a greater amount of useful information out of them. We are aware that it is almost impossible to present all of our tables and ad-hoc queries in one application, no matter how large it is, but we do expect the ’SAS solution’ to process the major part of our everyday requirements. In order to meet these requirements we decided to create a Multi Dimensional Data Base (MDDB) and apply Multi Dimensional Reports (MDR). Implementation of Data Warehouse and Creation of MDDB Data transformation and integration is one of the first steps in the process of data warehouse implementation. One of the greatest difficulties of that time series (19861997) was the constant change in the administrative organization of the Republic of Croatia. When we once organized the territorial units according to the last year of our data series, the same thing had to be done with different codes for accommodation facilities over the years. Some changes and data estimates should also have been made for the countries of origin in order to reflect changes in Europe over the last ten years. Prior to creating the MDDB, we expanded the existing data with additional information. We defined a number of new variables, which were very often used in ad-hoc queries as well as in our in-house projects. We defined the dimensions in our MDDB using all the variables, the existing, as well as the new, ‘smarter’ ones. The dimensions were as follows: • territorial (county – municipality – settlement, islands, sea resorts, highlands, capital, in-land etc.) • accommodation (hotels – hotel categories, camps - camp categories, etc.) • country of origin (primary and secondary markets, ECU countries, OECD etc.) • time (out of season, pre-season, post-season, holidays) Application To organize our data neatly and smartly was the first step. We divided our tables into two categories: the first one, including the tables of tourism traffic according to some of the mentioned dimensions and the second one, including comparative tables and tables designed by WTO. We started with the first group of tables. An example of such a table is given in Figure I. Once you have your MDDB defined the creation of a table using MDR is an easy task. Some problems arose when we wanted to expand the tables: 1. format PCTSUM different from SUM, 2. label PCTSUM or SUM. Other problems encountered were: 3. exporting a table into EXCEL, as well as 4. showing definitions of subsets defined in the report. Solving the latter would solve the need for ad-hoc queries. At that point, SAS Institute, Slovenia, helped us a lot by some of the existing methods as well as supporting us in designing our own methods. As a result, MDR exported to EXCEL via 'Write to HTML' method is shown in Figure II. At that point another part of our application was completed. We designed the interface for our in-house customers using SAS/AF module. The interface consists of several entry windows in which customers have an opportunity to choose among several predefined MDRs. The reports are organized in several groups according to the main dimension used ('Reports by Territorial Dimension', 'Reports by Country of Origin', 'Reports by Accommodation Facility'). Choosing one of the report groups, the list of different reports is shown and a user may choose one. We then started with implementation of the second group of the tables. An example of such a table is shown in Figure III. The problems we encountered here were mostly in the definition part of a MDR: 1. 2. 3. 4. show the cumulative percentage of a variable, sort the report by values of analytical variable, assign a rank to a specified variable, show SUM of one variable but PCTSUM of the other. SAS Institute Slovenia helped us again. The rank of a variable, as well as cumulative percentage of a variable is solved by defining the computed variable in Metabase. Prior to that, we had to add two new attributes, RANK and CUMUL, and change the EIS Setup Attributes Search Path. Sorting the report by the values of an analytical variable was solved by a written method, as well as hiding the column for which some of the statistics are not wanted. An example of MDR using a part of this new functionality is shown in Figure IV. In our future work we have to extend our two-year series to the whole series. We also have to connect our tourism traffic data with data about accommodation capacity and make a new part of our data warehouse dealing with exploitation. Figure I - Example of a pre-defined table. 7RXULVPWUDIILFLQ,675,$DQG35,0256.2*25$16.$FRXQW\ LQ+27(/6E\WRXULVWV FRXQWU\RIRULJLQ &RXQWU\RIRULJLQ $XVWULD &KHFK5HSXEOLF *HUPDQ\ ,WDO\ 727$/ 1 1 Figure II Example of MDR exported to EXCEL. 8YMHWL 5HGQLEURMåXSDQLMH ,67$56.$35,0256.2*25$16.$ âLIUDSRGJUXSHVPMHãWDMQLKNDSDFLWHWD +B2B7B(B/B,BB6B9B(B*B$ *RGLQD %URMRVWYDUHQLKQRüHQMD %URMRVWYDUHQLKQRüHQMD âLIUD]HPOMHSRULMHNOD %261$B,B+(5&(*29,1$ +59$76.$ 0$.('21,-$ 6/29(1,-$ $8675$/,-$ $8675,-$ %(/*,-$ %8*$56.$ %-(/2586,-$ .$1$'$ ý(â.$ 6/29$ý.$ '$16.$ (6721,-$ ),16.$ )5$1&86.$ 1-(0$ý.$ 6XPD 8GLR 6XPD 8GLR ,56.$ ,=5$(/ ,7$/,-$ *5ý.$ 0$ $56.$ ,6/$1' /,79$ /8.6(0%85* -$3$1 /(721,-$ 1,=2=(06.$ 129,B=(/$1' 1259(â.$ 32/-6.$ 32578*$/ 58081-6.$ 586.$B)('(5$&,-$ â3$1-2/6.$ â9('6.$ â9,&$56.$ 7856.$ 6$' 267$/(B(95236.(B=(0/-( 267$/(B,=9$1(95236.(B=(0/-( 8.5$-,1$ 9(/,.$B%5,7$1,-$ 8.8312 Figure III - Example of a WTO table. &RXQWLHVUDQNHGE\QXPEHURIRYHUQLJKWVLQ 5$1. 1LJKWV1 &2817< ,QGH[ 3HUFHQWDJH &52$7,$727$/ Figure IV - Example of MDR with RANK and CUMULATIVE defined, exported to EXCEL. 5DQJQRüHQMD %URMRVWYDUHQLKQRüHQMD 6XPD äXSDQLMD .XPXODWLYQDQRüHQMD 6XPD 6XPD ,67$56.$ 35,0256.2*25$16.$ 63/,76.2'$/0$7,16.$ '8%529$ý.21(5(79$16.$ =$'$56.$ *5$'B=$*5(% /,ý.26(1-6.$ â,%(16.2.1,16.$ .5$3,16.2=$*256.$ 26-(ý.2%$5$1-6.$ 9$5$ä',16.$ =$*5(%$ý.$ 98.29$56.265,-(06.$ 6,6$ý.2026/$9$ý.$ .235,91,ý.2.5,ä(9$ý.$ 0( %52'6.2326$96.$ .$5/29$ý.$ %-(/29$56.2%,/2*256.$ 32ä(â.26/$9216.$ 9,529,7,ý.232'5$96.$ ,0856.$ Conclusion Having a clear idea of goals to be achieved, and also limitations which can not be modified (personnel, funding!) decision-making and procurement of new software for the Institute for Tourism was a rather carefully considered process. However, such an approach, ensured a number of advantages. Immediately after the purchase of software, the Institute’s staff members started the implementation of data warehousing and statistic analysis of data collected within the TOMAS ’98 project, finalized just before the SAS software was purchased. Thus, one of the main goals of the project was achieved – data processing within the Institute itself (outsourcing no longer required). By finalizing the Pilot Project the first stage of implementing the Data Warehouse was completed, our ideas were checked and the solutions for adjusting a universal software package to our particular requirements were carried out. The second stage of our project started by education of users and expanding the Data Warehouse with data from other sources, both of the Institute’s and also external. As a lot of money and effort, has been invested into the Data Warehouse, ‘data mining’ can not be only ‘project driven’. We are aware that we will only be able to retain the Data Mining we are now applying by expanding the number of users. Whether this is possible we will have to verify prior to the date for renewal of the license.