Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
How to design your MDDB? Geert Peeters, Origin International (ICA) e-mail: [email protected] Abstract. One of the most valuable add-ons to SAS6.12 is the introduction of the multi-dimensional database (MDDB). Although the advantages of the MDDB are clear to all, i.e. better performance while accessing enormous detailed data, still a lot of people fail to bring them in practice. This can mainly be put down to fact the users of MDDB’s make the wrong choices during the design of their system, resulting in a bad performance. This article tries to give some answers in how to build a data model using MDDB technology. They are based on practical experience, and gathered during an implementation of a large EIS-system, making use of SAS6.12 technology. The main topics addressed in this article are: the size of a SAS MDDB and the way to circumvent the 2GB limit; the choice between a mono-cube structure and a multi-cube structure and the implementation of it in SAS; MDDB in a client/server environment ; and difference between ROLAP, MOLAP and HOLAP and their practical use in SAS. For example: suppose we want to build an application in which the sales margins can be analysed. In order to implement this at least the sales and cost price will be needed. This type of elements are called analysis variables. Analysis variables usually contain measurable values. In order to give sense to these analysis variables, they need to be connected with some event. An event can be identified using class variables. Class variables have limited or discrete class values. The base table holds the events and their analysis variables at the most detailed level. For example: a base table may contain sales transactions. The elements by which the sales are identified are the class variables, e.g. time stamp and sales place. The amount and quantity of the sale are the analysis variables. Introduction. OLAP-applications are characterised by the flexibility with which users can view and report the data any way they want; to perform new ad hoc analyses; to do large scale complex calculations; and to perform dynamic exception reporting from large databases. Together with the introduction of these applications, a very interesting discussion has been started concerning the best supporting database technology for it. The most quoted technology is the multi-dimensional database. A MDDB roughly organises its data in such a way that at moment of data access the database is able to respond immediately to the query. Without any doubt this leads to applications with very good response times. In traditional databases the design of the database is done with universal accepted techniques such as normalisation. These techniques can be applied in almost every relational database. The theory around multidimensional modelling is however not so far evolved. This article is a step by step approach explaining how to design your MDDB in SAS. This theory is based on practical experience. Dimensions. Some of the class variables are interrelated, e.g. there might be class variables containing the sales place and the sales region. In a relational database, following the normalisation steps, this kind of information would be put in separate tables. However in MDDB technology we keep these elements together in a denormalised base table, containing redundant information. The set of class variables that have a relationship, is called a dimension. Different dimensions are by default never interrelated. A logical way to show a MDDB is a star diagram. Each axis on the diagram represents a dimension. The class variables that belong to the dimension are drawn within that axis. The way the different class variables are related, are shown by the way they are connected. Gathering elements. The first step in the design of the MDDB is the gathering of the elements the model should contain. This means the designer of the OLAPapplication should know which elements are going to be used in the application and should understand how they are calculated. This information can normally be derived directly from the business rules the OLAP-application will support. -1- Product axis 12NC Entity axis CAG Entity AG MAG Front Office Product type axis BG FO Grp Month PD Year P type Time axis Quarter Brand 12NC Sizing. The size of a SAS/MDDB can be calculated using: • the number of analysis variables; • the number of class variables; • the maximum formatted length of each class variable; • the length of each class variable; • the number of distinct values in each class variables; and • the number of valid crossings between class variables for each hierarchy. The formula for the exact size of the SAS/MDDB can be found in the article referenced [1]. The size of the SAS/MDDB is however limited to 2GB. Brand axis Class Eurochannel Ship to Customer class axis the SAS/MDDB selects that hierarchy that is the closest to the one that corresponds with the query and calculates the information at run-time. By selecting the closest hierarchy a very good response time is maintained. Corp. Regrouping Statement to Ship to Invoice to Ship to Eurochannel axis Customer axis An example of a star diagram. Summary crossings. An ideal MDDB contains for each combination of class variables pre-summarised analysis variables. In this way the MDDB is able to respond immediately to each kind of query. A combination of class variables is called a summary crossing. There are some draw-backs to the approach where all crossings are available. The first problem is the number of possible crossings and therefore the size of the MDDB. In theory this can be very high (see example). A second disadvantage is the usability of the crossings. A lot of the available summary crossings will never be approached by any query. The total number of summary crossings are expressed by following formula: the multiplication of the number of class variables within a dimension and this summed per combination of dimension. For example: imagine a base table containing 3 dimensions, e.g. time, location and product. The number of class variables are e.g. 5 elements for the time dimension, 2 elements for the location dimension and 7 elements for the product dimension. The possible combinations of the dimensions are: time-location-product, time-location, time-product, location-product, time, location and product. This leads to 5*2*7 + 5*2 + 5*7 + 2*7 + 5 + 2 + 7 = 143 summary crossing. Accessing the SAS/MDDB. SAS supplies three different ways to access the data in a MDDB. The most obvious use of the MDDB is in SAS/EIS. The metabase of SAS/EIS is a user friendly tool, in which the attributes of the MDDB can be defined. This information is used to assist the building of the OLAPapplications. The access of the data is handled by the SAS/EIS model co-ordinator (EMDDB_M class). This model co-ordinator can be sub-classed. In this way an in-depth customisation can be obtained. Another way of accessing the data of the MDDB is through SCL. In SCL an instance of the MDDB_M class can be created. By sending the correct methods to this object the data of the MDDB can be fetched. The last way of using the data in the MDDB is in Base/SAS. A special engine (SASSFIO) is foreseen to open the MDDB as a library. In this way the MDDB tables are directly available for Base/SAS programs. To overcome this problem, SAS/MDDB allows to define hierarchies at MDDB buildtime. The summary crossings chosen are those that correspond to the queries the most likely asked. In this way SAS/MDDB reduces the number of unusable hierarchies. In case a query is asked where no hierarchy exists for, Design issues. There are some factors that need to be considered during the design of the MDDB. The choices made are of influence to the size -2- combined to a common base table, using the total set of class variables. This results in a sparse data set, holding a lot of analysis element on missing. The MDDB build on top of this base table will, on his turn, have a high degree of sparsity. This means a very big MDDB will be built of which only a limited amount of storage space is going to be used. This approach is called a mono-cube structure. An alternative way to store the data is to keep different kinds of analysis elements in separate MDDB’s. This leads to reduced sizes of the individual MDDB’s. The latter way of storing analysis element is called the multi-cube approach. The retrieval of the data in a multicube, can be accomplished through customised data access. of the SAS/MDDB and affect the way the OLAP-applications needs to be built. The first possible issue is the choice between combined class variables or single class variables. A single class variable is a variable that uniquely can be identified by its values. A combined class variable on the other hand needs multiple variables to identify the class. For example: the class year/month can be identified by the combination of two variables year and month. E.g. June 1998 can be identified by the combination year = 1998 and month = 6. The same information can be stored in a single variable yearmm. E.g. the month June 1998 can be identified by the single variable yearmm = 199806. There is an important advantage when a MDDB is built with single class variables. The relationships between single class variables of the same dimension can very easily be stored in SAS/formats. These formats can become very handy when building the OLAPapplication, or can also be used when customising the MDDB access. On the other hand, single class variables have a higher number of distinct class variables. They also have a bigger class variable size, compared to combined class variables. Both features negatively influence the size of the MDDB. A last design issue exists when the MDDB is build on an enormous base table. In this case the MDDB might grow above 2GB. This is a problem since the SAS/MDDB is limited to this size. In order to solve this problem, this one MDDB can be split in multiple MDDB’s. The way to do this is by spreading the data of the base table over multiple base tables. Suppose a base table is build with n dimensions. This base table can be split into n base tables containing n-1 dimensions. This is a technique already used in SAS/Motore (SAS 6.11). For example: imagine a base table containing 3 dimensions, e.g. time, location and product. The data in this base table can be spread over 3 other base tables containing the dimensions time-location, time-product and locationproduct. The access of the data in these multiple MDDB’s can be done by customised data access. A next issue is the choice of technical keys instead of extended class variables or formatted class variables. As indicated in the sizing of the MDDB is the storage space of an MDDB negatively influenced by the length of the formatted class variables. It is therefore advisable to replace long class variables with smaller technical keys. A technical key is an arbitrary unique number. The relation between the actual class variable value and the technical key can be stored in SAS/formats and SAS/informats. The major advantage of this approach is the smaller size of the MDDB. The translation to and from the technical key can be done through customised data access. Client/server. Since OLAP-applications use large amounts of data and perform complex analyses, they need to be built on a platform with the necessary resources. This is the reason why OLAPapplications are often associated with a client/server architecture. There exist several ways by which a SAS/MDDB can be accessed in a client/server environment. The first way to access a SAS/MDDB on a server is by using the Remote Library Service (RLS). This technique allows data of a MDDB to be transported from the server to the client. The analysis of the data is still performed on the client. In OLAP-applications it happens a lot that elements of a total different origin are accessed together. For example: in an OLAP-application that will be used to analyse the profitability of a company, accounting and sales elements are often used together. A problem exists when different types of elements are identified by different kinds of class variables. These elements can still be -3- In case of heavy analyses it might be preferable to perform the calculations on the server as well. This means that only the result of the calculations will be transported in the direction of the client. There are two ways in which this can be implemented. In the first solution, every MDDB access results in the creation of an instance of the MDDB_M class on the remote SAS session. This instance is used to fetch the data. After the calculation of the result, the instance will be terminated. In the second solution, a permanent instance of the MDDB_M class (or a customised subclass) is maintained on the server. This permanent instance can only exist in an AFapplication. This AF-application can be executed as a background process on the server. The remote session is used to communicate with the background process. The second solution is advisable when a customised sub-class for the MDDB access is constructed. The creation of an instance of this sub-class every time the MDDB is accessed would be too time consuming. HOLAP. MOLAP stands for an OLAP-application build on the MDDB-technology. ROLAP on the other hand is an OLAP application on top of a relational database. HOLAP is an hybrid form of MOLAP and HOLAP. SAS6.12 also offers an HOLAP extension, in which the data is partly stored in a MDDB and partly in data sets. The result of this extension is a reduced storage space and improved response times. Conclusion. SAS/MDDB is a very valuable add-on to the SAS-system that allows the design of performant OLAP-applications. SAS/MDDB also offers the necessary tools to customise the MDDB according to the needs of the OLAPapplication. Reference. [1] M. Moorman. Getting a Grip on a Growing Concern: Managing Large Data Sets with a Multidimensional Database. Observations First Quarter 1997, pp. 52-56. -4-