Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Data Protection Act, 2012 wikipedia , lookup
Entity–attribute–value model wikipedia , lookup
Data center wikipedia , lookup
Operational transformation wikipedia , lookup
Forecasting wikipedia , lookup
Data analysis wikipedia , lookup
Information privacy law wikipedia , lookup
3D optical data storage wikipedia , lookup
Business intelligence wikipedia , lookup
IRACST ‐ International Journal of Advanced Computing, Engineering and Application (IJACEA), ISSN: 2319‐281X, Vol. 3, No.3, June 2014 Design and implementation of Multi Dimensional Data Cube and OLAP operation for Single Window Counselling Support System Dr.V.Valli Mayil, Director Vivekanandha Institute of Information and Management Studies Tiruchengode Abstract: Single window counseling system is a transparent way to conduct higher study admission for professional colleges. Most of the state government in India has designed, developed and adopted the system to ensure the aspirants to get admission into professional Institutions. It is the highly visible process through which, students are called to participate and elect the course and college for the admission. Appearing single window counseling is an important for every aspirant to get admission into professional colleges. Decision regarding choosing a best Instituion for available cutoff mark is a challenging task which requires more interactive and comparative analysis. This paper discusses the design methods to construct decision support server which guides a user the complete information such as Institution, course and minimum cutoff marks needed. The Intelligent recommended system is constructed in terms of datawarehouse, requires organizing large volume of data and to process complex query methods as the customer can use the data in multiple view in order to make strategic decisions. The system can be built on multidimensional data cube in order to handle large volume of data. Each cell of data cube has the minimum cutoff marks needed to get admission for a course in the instituton. This paper discusses the design overview of dta cube and schema definition in terms of star and snowflake for maintaining decision support system which guides the user by projecting data in all directions. Implementation of data cube is discussed with Data Mining Query Language (DMLQ) concepts. Keywords : Data Cube, Multi Dimensional Model, OLAP OPearations, DMLQ Query I INTRODUCTION Decision support Systems (DSS) are rapidly vital elements in higher study guidance system. The DSS gets the data from operational data base and turns into the valuable result for query. Many corporation or organization are building unified decision support system called data warehouse. The dataware house is a semantically consistent data store that serves as a physical implementation of decision support data model. It stores the information on which the enterprise needs to make a strategic decision. A higher studies counseling system as a data warehouse is also often constructed by integrating data from multiple heterogeneous sources or Institutions to support structured and/or ad hoc queries, analytical reporting, and decision making. The normalized relational data base is not sufficient to store and retrieve the data. It is made on multidimensional data model as the recommended system can be used by multiple user with multiple views. II. DESIGN OF MULTIDIMENSIONAL DATA CUBE Data warehouses support on-line analytical processing (OLAP) tools for the interactive analysis of multidimensional data. OLAP is on-line information processing is developed to performs consolidated, historic data rather than to perform operational data. It needs to perform complex queries with read only operation. OLAP tools are based on Multidimensional datamodel. The model views the data in the form of data cube. The data cube allows data to be modeled and viewed in multiple dimensions. It is defined by dimensions and facts. In general terms, dimensions are the perspectives or entities with respect to which an organization wants to store data. 43 IRACST ‐ International Journal of Advanced Computing, Engineering and Application (IJACEA), ISSN: 2319‐281X, Vol. 3, No.3, June 2014 In our example, the higher study recommended system is a datawarehouse, created to store the minimum marks that had been adopted in the previous years to get admission for a course in a college. This guidance system is used to display the previous year data that had been adopted for the admission. As it is the guidance system, the user can predict the type of course or college that will be possible to get admission in the current year. The system is created with dimensions College, Course and year. Each dimension may have a table associated with it, called a dimension table. For example, a dimension table for “college” may contain the attributes college name, code, address, trustinformation etc., similarly the dimension table for “Course” has Coursename, coursecode, subjects etc. All the dimensions in the cube are represented as Dimension College (Collegename, Collegecode, Address, Trustinfo) Dimension Course (Coursename, Code, Subjects, Sem, Feestructure) Dimension Year (Year, Quarterly, Halfyearly) Dimension District(code, name, number of colleges) A multidimensional data model is created to store a numerical measures, called fact, this is represented by a fact table. The fact of the Recommended system includes a minimum cutoff mark.that the college adopted for the admission . Using the data cube, the user can view minimum mark adopted by institutions in the following views with tuple (college, Course, Year, District), the * denotes the values specified in the tuple. all college , all previous years, all district, if a course given for instance CSE, (all,*,all,all) all course, all year, all district, if college name is given (*, all,all,all) all college, all course, year 2013, all district (*,*,2013,*) all college,CSE,2013,all District (*,CSE,2013,*) The 2-D representation of the data is shown in table 2.1. In the table, the minimum cutoff marks are given with respect to the dimension college and Course (organized according to the types of course). Table 2.1 2-D view of Recommended system Year 2012 Course College ECE EEE CSE College1 College2 The above table shows minimum marks adopted for the admission by various colleges in coursewise. The another dimension “year” can be added to view the cube in a year wise data. The 3-D representation of above data is shown in table 2.2. Here the data are represented as a series of 2-D tables. Conceptually, the same data is represented in the form of a 3-D data cube. Table 2.2 3D View of Single window Recommended System Year:2012 ECE EEE CSE College1 College2 Year 2013 ECE EEE CSE College1 College2 The minimum mark data can also be viewed in terms of 4th dimension such as “District”. The table 2.3 represents the minimum marks adopted in the year 2012 for different courses by various colleges in a District “Chennai”. Viewing the 4th dimention is the series of 3_D cubes. Similarly n-data cube is also constructed as a series of (n-1) D cubes. Table 2.3: 4-D Representation of Recommended System College District = “Chennai” Year 2012 Course ECE EEE College1 College2 CSE 44 IRACST ‐ International Journal of Advanced Computing, Engineering and Application (IJACEA), ISSN: 2319‐281X, Vol. 3, No.3, June 2014 District=”Kanchipuram” Year 2012 Course Colleg ECE e be performed by Group-By operations. Building and implementing the Cube is done by Define and Compute primitives. E E E C S E College1 College2 above tables show the data at different degrees of summarization. Given a set of Dimensions , a subset of dimensions shows the data in a different level of summarization. This is referred as a lattice of cubiod or Datacube. This can be computed by Group by operation. Figure 2.1 shows a lattice of cuboids forming a data cube for the dimensions year, college , course and district. A Define and Compute Cube Primitives The compute cube operator aggregates over all subsets of the dimensions given in the Cube. A data cube of recommended system has 3 attributes such as course, college, year, a total of 2 3 = 8 cuboids can be constructed. Hence 8 group by operations are possible. These are listed as follows.. Let us consider Set A = {Course, college, year}, all the subsets, such as { (course,college, year), (course, college), (course, year), (college,year), (college), (course) (year) () } forms as a cuboid. The base cuboid contains all three dimensions, (course,college,year). The cuboid returns fact value for the combination of given three dimensions. The apex cuboid, or 0-D cuboid, refers to the case where the group-by is empty. It displays the fact value “min mark” for all courses, all colleges for all years. The base cuboid is the least generalized (most specific) of the cuboids. The apex cuboid is the most generalized (least specific) of the cuboids, and is often denoted as “all”. If we start at the apex cuboid and explore downward in the lattice, this is equivalent to drilling down within the data cube. If we start at the base cuboid and explore upward, this is akin to rolling up. DMQL syntax for defining cube is Figure 2.1 Lattice of Cuboid of 4 Dimensions The cuboid that holds the lowest level of summarization is called the base cuboid. For example, the 3-D cuboid in Figure 2.1 is the base cuboid for the given (college,course,year), The 0-D cuboid, which holds the highest level of summarization, is called the apex cuboid. In our example, minimu mark adopted is calculated for all colleges, all courses and all years , summarized over all four dimensions. The apex cuboid is typically denoted by all. III. EFFICIENT IMPLEMENTATION OF DATA CUBES Data analysis in multidimensional model is the process of computing aggregations across many sets of dimensions. The aggregation operations can Define cube Recommend [course,college, year]: mark Compute cube operator is used to compute te aggregate value of the cube. Each cubiod is computed with Group-By operations. A DMQL for cuboid is as follows Cubiod 0 : Compute Cube recommend is a zero-dimensional operation which gives the all data in the cube. Cubiod 1: Compute Cube Recommend Select course group by college It is a one-dimensional operation. It displays the course wise mark data according to given college name or code only. Cuboid 2: Compute Cube Recommend Select course, year Group by Colleg, year. It is the 2 Dimensional Operations. The query displays the data for a given college and year. 45 IRACST ‐ International Journal of Advanced Computing, Engineering and Application (IJACEA), ISSN: 2319‐281X, Vol. 3, No.3, June 2014 Similarly the query is used to analyse the data in single window higher study recommended system. “Compute Cube Recommend Select College name, code Group By (College, Course,*) “Compute Cube Recommend Select College name, code Group By ( College, *, Year) “Compute Cube Recommend Select College name, code Group By (*, Course, Year) “Compute Cube Recommend Select College name, code Group By ( College, Course, Year) IV. SCHEMA DEFINITION: STAR AND SNOWFLAKE SCHEMA The model for multidimensional data is needed to define the relationship between entities. Such a model can exist in the form of a star schema, a snowflake schema, or a fact constellation schema.. The most common modeling paradigm is the star schema, which contains (1) a large central table (fact table) and (2) a set of smaller dimension tables. It has single fact table connected to dimension tables like a star. In the schema only one join establishes the relationship between the fact table and any one of the dimension tables. The star schema is highly denormalized and the snowflake schema is normalized. So the data access latency is less and size of data ware house is large in star schema. A star schema for Higher Study recommendation system shown in Figure 4.1 It has four dimensions college, course, year, district .The schemacontains a central fact table contains keys to each of the four dimensions, along with a minimum mark measure . Figure 4.1 Star Schema for Single Window Recommended system Notice that in the star schema, each dimension is represented by only one table, and each table contains a set of attributes. For example, the course dimension table contains the attribute set course key, course code, subject, semester . The snowflake schema is a variant of the star schema model, where some dimension tables are normalized, thereby further splitting the data into additional tables. The resulting schema graph forms a shape similar to a snowflake. The major difference between the snowflake and star schema models is that the dimension tables of the snowflake model may be kept in normalized form to reduce redundancies. Such a table is easy to maintain and saves storage space. However, Performance wise star flake is good. the snowflake structure can reduce the effectiveness of browsing, since more joins will be needed to execute a query. 46 IRACST ‐ International Journal of Advanced Computing, Engineering and Application (IJACEA), ISSN: 2319‐281X, Vol. 3, No.3, June 2014 course college course_key name subject semester feestructure joboffer Fact Table college code course key College code college_name address principal Trust-info address Location district District year district Year District_key district_name mark Year Quarterly Measures 7 Figure 4.2 snowflake Schema for Single window Recommendation System A snowflake schema for Single window rcommended system is given in Figure 4.2. The single dimension table for course and college is normalized in the snowflake schema. DMQL for Defining Star, Snowflake Schemas SQL based data mining qyery language (DMQL) is used to define cube. Dataware houses and data marts can be defined using two language primitives, one for cube definition and one for dimension definition. The cubeand dimension definition statement has the following syntax: define cube cube namei [dimension list]: measure list define dimension dimension name as (attribute or dimension list) The star schema of Figure 4.1 is defined in DMQL as follows: define cube Recommend [College, Course, Year] minmarks define dimension College as (College code, name, address, principal, trustinfor) define dimension Course as (Course key, coursename, subjects, semester, fee structure, joboffer) define dimension Year as (quarterly, year) The snowflake schema of Figure 4.2 is defined in DMQL as follows: define cube Recommend snowflake [College, Course,Year]:Minmarks define dimension College as (Collegecode, name, address(location,district), principal(name, Qualification),trustinfo(trustname, members, year of establishment), year) define dimension Course as (Coursekey, Coursename, subject(subname, detail) , feestructure, joboffer) V. OLAP OPERATIONS IN THE SINGLE WINDOW RECOMMENDED SYSTEM In the multidimensional model, data cube is organized into multiple dimensions, and each dimension contains attributes forms a multiple levels of abstraction. OLAP operations are used to view data from different perspectives. A number of OLAP data cube operations are used to provide interactive querying and analysis for different views. OLAP provides a user-friendly environment for interactive data analysis. Let us consider the data cube of single window recommended system, which contains three dimensions such as College, Course, Year and the fact min marks. The dimension college is aggregated with college name, Course is aggregated with Course name and year id aggregated with admission trend of previous year. OLAP operations are given as follows. Roll-up: The roll-up operation (also called the drillup operation by some vendors) performs aggregation on a data cube, either by climbing up a concept hierarchy for a dimension or by dimension reduction. The dimension table “College” contains address as one of its main attributes. The roll-up operation displays aggregates the data by ascending the address hierarchy from the level of address to the level of district. In other words, rather than grouping the data by address , the resulting cube groups the data by district. When roll-up is performed by dimension reduction, one or more dimensions are removed fromthe given cube. Rollup operation : Rollup on college (from Address to District) operations Drill-down: Drill-down is the reverse of roll-up. It navigates from less detailed data to more detailed data. Drill-down can be realized by either stepping down a concept hierarchy for a dimension or introducing additional dimensions. The drill-down operation performed on the cube by stepping down a concept hierarchy for Course defined as “subject<subjectdetail” Drill-down occurs for 47 IRACST ‐ International Journal of Advanced Computing, Engineering and Application (IJACEA), ISSN: 2319‐281X, Vol. 3, No.3, June 2014 descending the college details as more detail level of trustinformation. Because a drill-down adds more detail to the given data, it can also be performed by adding new dimensions to a cube. REFERENCES 1. V. Harinarayan, A.Rajaraman, and J.D. Ullman, Implementing Data Cubes Efficiently, 1995 Drilldown: drilldown on course (from subject to subject detail) year 2. Slice and dice: The slice operation performs a selection on one dimension of the given cube. A slice operation on recommended system is performed where the data are selected from the central cube for the dimension “year” using the criterion year = 2013. Efficient Computation of Data Cubes and Aggregation views, Leonardo Tininini (CNR - Istituto di Analisi dei Sistemi e Informatica “Antonio Ruberti,” Italy) Copyright © 2005. 6 pages. 3. S. Agarwal, R. Agrawal, P. M. Deshpande, A. Gupta, J. F. Naughton, R. Ramakrishnan, and S. Sarawagi. On the computation of multidimensional aggregates. VLDB’96 4. D. Agrawal, A. E. Abbadi, A. Singh, and T. Yurek. Efficient view maintenance in data warehouses. SIGMOD’97 5. R. Agrawal, A. Gupta, and S. Sarawagi. Modeling multidimensional databases. ICDE’97 6. S. Chaudhuri and U. Dayal. An overview of data warehousing and OLAP technology. ACM SIGMOD Record, 26:6574, 1997 7. C. Imhoff, N. Galemmo, and J. G. Geiger. Mastering Data Warehouse Design: Relational and Dimensional Techniques. John Wiley, 2003 8. W. H. Inmon. Building the Data Warehouse. John Wiley, 1996 9. R. Kimball and M. Ross. The Data Warehouse Toolkit: The Complete Guide to Dimensional Modeling. 2ed. John Wiley, 2002 The dice operation defines a subcube by performing a selection on two or more dimensions.A dice operation on the central cube based on the following selection criteria that involve three dimensions: (college = “kpr”) and (year =2011”) and (course=”ECE”.). Pivot (rotate): Pivot (also called rotate) is a visualization operation that rotates the data axes in view in order to provide an alternative presentation of the data. DMQL provides the following primitives for traversing different level of abstraction <multi-level-manipulation> ::= upon<attribte name> | Downon<attributename> | add <attributename> | drop <attributename> VI. CONCLUSION This paper discusses the design principles and methods for the multidimensional data cube for single window counseling system. The model facilitate user to handle the data cube instead of relational data base. The Query methods has also been discussed to implement the cube and view the data in multiple views. Multidimensional data model support OLAP tool to view the data. The paper generalize the operations of OLAP, so that the data can be viewed in all directions. 48