Download Design and implementation of Multi Dimensional Data

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Big data wikipedia , lookup

Data Protection Act, 2012 wikipedia , lookup

Entity–attribute–value model wikipedia , lookup

Data center wikipedia , lookup

Operational transformation wikipedia , lookup

Data model wikipedia , lookup

Forecasting wikipedia , lookup

Data analysis wikipedia , lookup

Information privacy law wikipedia , lookup

3D optical data storage wikipedia , lookup

Business intelligence wikipedia , lookup

Data vault modeling wikipedia , lookup

Database model wikipedia , lookup

Transcript
IRACST ‐ International Journal of Advanced Computing, Engineering and Application (IJACEA), ISSN: 2319‐281X, Vol. 3, No.3, June 2014 Design and implementation of Multi
Dimensional Data Cube and OLAP operation
for Single Window Counselling Support
System
Dr.V.Valli Mayil, Director
Vivekanandha Institute of Information and Management Studies
Tiruchengode
Abstract:
Single window counseling system is a
transparent way to conduct higher study admission for
professional colleges. Most of the state government in
India has designed, developed and adopted the system
to ensure the aspirants to get admission into
professional Institutions. It is the highly visible process
through which, students are called to participate and
elect the course and college for the admission.
Appearing single window counseling is an important for
every aspirant to get admission into professional
colleges. Decision regarding choosing a best Instituion
for available cutoff mark is a challenging task which
requires more interactive and comparative analysis.
This paper discusses the design methods to construct
decision support server which guides a user the
complete information such as Institution, course and
minimum cutoff marks needed.
The Intelligent
recommended system is constructed in terms of
datawarehouse, requires organizing large volume of
data and to process complex query methods as the
customer can use the data in multiple view in order to
make strategic decisions. The system can be built on
multidimensional data cube in order to handle large
volume of data. Each cell of data cube has the
minimum cutoff marks needed to get admission for a
course in the instituton.
This paper discusses the
design overview of dta cube and schema definition in
terms of star and snowflake for maintaining decision
support system which guides the user by projecting
data in all directions. Implementation of data cube is
discussed with Data Mining Query Language (DMLQ)
concepts.
Keywords : Data Cube, Multi Dimensional Model, OLAP
OPearations, DMLQ Query I INTRODUCTION Decision support Systems (DSS) are rapidly
vital elements in higher study guidance system. The
DSS gets the data from operational data base and
turns into the valuable result for query. Many
corporation or organization are building unified
decision support system called data warehouse. The
dataware house is a semantically consistent data store
that serves as a physical implementation of decision
support data model. It stores the information on
which the enterprise needs to make a strategic
decision. A higher studies counseling system as a
data warehouse is also often constructed by
integrating data from multiple heterogeneous sources
or Institutions to support structured and/or ad hoc
queries, analytical reporting, and decision making.
The normalized relational data base is not sufficient
to store and retrieve the data. It is made on
multidimensional data model as the recommended
system can be used by multiple user with multiple
views.
II. DESIGN OF MULTIDIMENSIONAL DATA
CUBE
Data warehouses support on-line analytical
processing (OLAP) tools for the interactive analysis
of multidimensional data. OLAP is on-line
information processing is developed to performs
consolidated, historic data rather than to perform
operational data. It needs to perform complex queries
with read only operation. OLAP tools are based on
Multidimensional datamodel. The model views the
data in the form of data cube. The data cube allows
data to be modeled and viewed in multiple
dimensions. It is defined by dimensions and facts. In
general terms, dimensions are the perspectives or
entities with respect to which an organization wants
to store data.
43
IRACST ‐ International Journal of Advanced Computing, Engineering and Application (IJACEA), ISSN: 2319‐281X, Vol. 3, No.3, June 2014 In our example, the higher study recommended
system is a datawarehouse, created to store the
minimum marks that had been adopted in the
previous years to get admission for a course in a
college. This guidance system is used to display the
previous year data that had been adopted for the
admission. As it is the guidance system, the user can
predict the type of course or college that will be
possible to get admission in the current year. The
system is created with dimensions College, Course
and year. Each dimension may have a table
associated with it, called a dimension table. For
example, a dimension table for “college” may contain
the attributes college name, code, address,
trustinformation etc., similarly the dimension table
for “Course” has Coursename, coursecode, subjects
etc. All the dimensions in the cube are represented as
Dimension College (Collegename, Collegecode,
Address, Trustinfo)
Dimension Course (Coursename, Code, Subjects,
Sem, Feestructure)
Dimension Year (Year, Quarterly, Halfyearly)
Dimension District(code, name, number of colleges)
A multidimensional data model is created to store
a numerical measures, called fact, this is represented
by a fact table. The fact of the Recommended system
includes a minimum cutoff mark.that the college
adopted for the admission . Using the data cube, the
user can view minimum mark adopted by institutions
in the following views with tuple (college, Course,
Year, District), the * denotes the values specified in
the tuple.
all college , all previous years, all district, if a
course given for instance CSE, (all,*,all,all)
all course, all year, all district, if college name is
given (*, all,all,all)
all college, all course, year 2013, all district
(*,*,2013,*)
all college,CSE,2013,all District (*,CSE,2013,*)
The 2-D representation of the data is shown
in table 2.1. In the table, the minimum cutoff marks
are given with respect to the dimension college and
Course (organized according to the types of course).
Table 2.1 2-D view of Recommended system Year 2012
Course
College
ECE
EEE
CSE
College1
College2
The above table shows minimum marks
adopted for the admission by various colleges in
coursewise. The another dimension “year” can be
added to view the cube in a year wise data. The 3-D
representation of above data is shown in table 2.2.
Here the data are represented as a series of 2-D
tables. Conceptually, the same data is represented in
the form of a 3-D data cube.
Table 2.2 3D View of Single window Recommended
System
Year:2012
ECE EEE
CSE
College1
College2
Year 2013
ECE EEE CSE
College1
College2
The minimum mark data can also be viewed in terms
of 4th dimension such as “District”. The table 2.3
represents the minimum marks adopted in the year
2012 for different courses by various colleges in a
District “Chennai”. Viewing the 4th dimention is the
series of 3_D cubes. Similarly n-data cube is also
constructed as a series of (n-1) D cubes.
Table 2.3: 4-D Representation of Recommended System
College
District = “Chennai”
Year 2012
Course
ECE EEE
College1
College2
CSE
44
IRACST ‐ International Journal of Advanced Computing, Engineering and Application (IJACEA), ISSN: 2319‐281X, Vol. 3, No.3, June 2014 District=”Kanchipuram”
Year 2012
Course
Colleg
ECE
e
be performed by Group-By operations. Building and
implementing the Cube is done by Define and
Compute primitives.
E
E
E
C
S
E
College1
College2
above tables show the data at different degrees of
summarization. Given a set of Dimensions , a subset
of dimensions shows the data in a different level of
summarization. This is referred as a lattice of cubiod
or Datacube. This can be computed by Group by
operation. Figure 2.1 shows a lattice of cuboids
forming a data cube for the dimensions year, college ,
course and district.
A Define and Compute Cube Primitives
The compute cube operator aggregates over
all subsets of the dimensions given in the Cube. A
data cube of recommended system has 3 attributes
such as course, college, year, a total of 2 3 = 8
cuboids can be constructed. Hence 8 group by
operations are possible. These are listed as follows..
Let us consider Set A = {Course, college,
year}, all the subsets, such as { (course,college, year),
(course, college), (course, year), (college,year),
(college), (course) (year) () } forms as a cuboid. The
base cuboid contains all three dimensions,
(course,college,year). The cuboid returns fact value
for the combination of given three dimensions. The
apex cuboid, or 0-D cuboid, refers to the case where
the group-by is empty. It displays the fact value “min
mark” for all courses, all colleges for all years. The
base cuboid is the least generalized (most specific) of
the cuboids. The apex cuboid is the most generalized
(least specific) of the cuboids, and is often denoted as
“all”. If we start at the apex cuboid and explore
downward in the lattice, this is equivalent to drilling
down within the data cube. If we start at the
base cuboid and explore upward, this is akin to
rolling up.
DMQL syntax for defining cube is
Figure 2.1 Lattice of Cuboid of 4 Dimensions
The cuboid that holds the lowest level of
summarization is called the base cuboid. For
example, the 3-D cuboid in Figure 2.1 is the base
cuboid for the given (college,course,year), The 0-D
cuboid, which holds the highest level of
summarization, is called the apex cuboid. In our
example, minimu mark adopted is calculated for all
colleges, all courses and all years , summarized over
all four dimensions. The apex cuboid is typically
denoted by all.
III. EFFICIENT IMPLEMENTATION OF DATA
CUBES
Data analysis in multidimensional model is
the process of computing aggregations across many
sets of dimensions. The aggregation operations can
Define cube Recommend [course,college, year]:
mark
Compute cube operator is used to compute
te aggregate value of the cube. Each cubiod is
computed with Group-By operations. A DMQL for
cuboid is as follows
Cubiod 0 : Compute Cube recommend
is a zero-dimensional operation which gives the all
data in the cube.
Cubiod 1: Compute Cube Recommend Select
course group by college
It is a one-dimensional operation. It displays the
course wise mark data according to given college
name or code only.
Cuboid 2: Compute Cube Recommend Select
course, year Group by Colleg, year.
It is the 2 Dimensional Operations. The query
displays the data for a given college and year.
45
IRACST ‐ International Journal of Advanced Computing, Engineering and Application (IJACEA), ISSN: 2319‐281X, Vol. 3, No.3, June 2014 Similarly the query is used to analyse the data in
single window higher study recommended system.
“Compute Cube Recommend Select College name,
code Group By (College, Course,*)
“Compute Cube Recommend Select College name,
code Group By ( College, *, Year)
“Compute Cube Recommend Select College name,
code Group By (*, Course, Year)
“Compute Cube Recommend Select College name,
code Group By ( College, Course, Year)
IV. SCHEMA DEFINITION: STAR AND
SNOWFLAKE SCHEMA
The model for multidimensional data is
needed to define the relationship between entities.
Such a model can exist in the form of a star schema, a
snowflake schema, or a fact constellation schema..
The most common modeling paradigm is the star
schema, which contains (1) a large central table (fact
table) and (2) a set of smaller dimension tables. It has
single fact table connected to dimension tables like a
star. In the schema only one join establishes the
relationship between the fact table and any one of the
dimension tables.
The star schema is highly denormalized and
the snowflake schema is normalized. So the data
access latency is less and size of data ware house is
large in star schema. A star schema for Higher Study
recommendation system shown in Figure 4.1 It has
four dimensions college, course, year, district .The
schemacontains a central fact table contains keys to
each of the four dimensions, along with a minimum
mark measure .
Figure 4.1 Star Schema for Single Window
Recommended system
Notice that in the star schema, each
dimension is represented by only one table, and each
table contains a set of attributes. For example, the
course dimension table contains the attribute set
course key, course code, subject, semester .
The snowflake schema is a variant of the
star schema model, where some dimension tables are
normalized, thereby further splitting the data into
additional tables. The resulting schema graph forms a
shape similar to a snowflake. The major difference
between the snowflake and star schema models is
that the dimension tables of the snowflake model
may be kept in normalized form to reduce
redundancies. Such a table is easy to maintain and
saves storage space. However, Performance wise star
flake is good. the snowflake structure can reduce the
effectiveness of browsing, since more joins will be
needed to execute a query.
46
IRACST ‐ International Journal of Advanced Computing, Engineering and Application (IJACEA), ISSN: 2319‐281X, Vol. 3, No.3, June 2014 course
college
course_key
name
subject
semester
feestructure
joboffer
Fact Table
college code
course key
College code
college_name
address
principal
Trust-info
address
Location
district
District
year
district
Year
District_key
district_name
mark
Year
Quarterly
Measures
7
Figure 4.2 snowflake Schema for Single window Recommendation System
A snowflake schema for Single window
rcommended system is given in Figure 4.2. The
single dimension table for course and college is
normalized in the snowflake schema.
DMQL for Defining Star, Snowflake Schemas
SQL based data mining qyery language
(DMQL) is used to define cube. Dataware houses and
data marts can be defined using two language
primitives, one for cube definition and one for
dimension definition. The cubeand dimension
definition statement has the following syntax:
define cube cube namei [dimension list]: measure
list
define dimension dimension name as (attribute or
dimension list)
The star schema of Figure 4.1 is defined in DMQL
as follows:
define cube Recommend [College, Course, Year]
minmarks
define dimension College as (College code, name,
address, principal, trustinfor)
define dimension Course as (Course key,
coursename, subjects, semester, fee structure,
joboffer)
define dimension Year as (quarterly, year)
The snowflake schema of Figure 4.2 is defined in
DMQL as follows:
define cube Recommend snowflake [College,
Course,Year]:Minmarks
define dimension College as (Collegecode, name,
address(location,district), principal(name,
Qualification),trustinfo(trustname, members, year of
establishment), year)
define dimension Course as (Coursekey,
Coursename, subject(subname, detail) , feestructure,
joboffer)
V. OLAP OPERATIONS IN THE SINGLE
WINDOW RECOMMENDED SYSTEM
In the multidimensional model, data cube is
organized into multiple dimensions, and each
dimension contains attributes forms a multiple levels
of abstraction. OLAP operations are used to view
data from different perspectives. A number of OLAP
data cube operations are used to provide interactive
querying and analysis for different views. OLAP
provides a user-friendly environment for interactive
data analysis.
Let us consider the data cube of single
window recommended system, which contains three
dimensions such as College, Course, Year and the
fact min marks. The dimension college is aggregated
with college name, Course is aggregated with Course
name and year id aggregated with admission trend of
previous year. OLAP operations are given as follows.
Roll-up: The roll-up operation (also called the drillup operation by some vendors) performs aggregation
on a data cube, either by climbing up a concept
hierarchy for a dimension or by dimension reduction.
The dimension table “College” contains address as
one of its main attributes. The roll-up operation
displays aggregates the data by ascending the
address hierarchy from the level of address to the
level of district. In other words, rather than grouping
the data by address , the resulting cube groups the
data by district. When roll-up is performed by
dimension reduction, one or more dimensions
are removed fromthe given cube.
Rollup operation : Rollup on college (from
Address to District) operations
Drill-down: Drill-down is the reverse of roll-up. It
navigates from less detailed data to more detailed
data. Drill-down can be realized by either stepping
down a concept hierarchy for a dimension or
introducing additional dimensions. The drill-down
operation performed on the cube by stepping down a
concept hierarchy for Course defined as
“subject<subjectdetail”
Drill-down occurs for
47
IRACST ‐ International Journal of Advanced Computing, Engineering and Application (IJACEA), ISSN: 2319‐281X, Vol. 3, No.3, June 2014 descending the college details as more detail level of
trustinformation. Because a drill-down adds more
detail to the given data, it can also be performed by
adding new dimensions to a cube.
REFERENCES
1.
V. Harinarayan, A.Rajaraman, and
J.D. Ullman, Implementing Data Cubes
Efficiently, 1995
Drilldown: drilldown on course (from subject to
subject detail) year
2.
Slice and dice: The slice operation performs a
selection on one dimension of the given cube. A
slice operation on recommended system is performed
where the data are selected from the central cube for
the dimension “year” using the criterion year = 2013.
Efficient Computation of Data
Cubes and Aggregation views, Leonardo
Tininini (CNR - Istituto di Analisi dei
Sistemi e Informatica “Antonio Ruberti,”
Italy) Copyright © 2005. 6 pages.
3.
S. Agarwal, R. Agrawal, P. M.
Deshpande, A. Gupta, J. F. Naughton, R.
Ramakrishnan, and S. Sarawagi. On the
computation
of
multidimensional
aggregates. VLDB’96
4.
D. Agrawal, A. E. Abbadi, A.
Singh, and T. Yurek. Efficient view
maintenance
in
data
warehouses.
SIGMOD’97
5.
R. Agrawal, A. Gupta, and S.
Sarawagi.
Modeling
multidimensional
databases. ICDE’97
6.
S. Chaudhuri and U. Dayal. An
overview of data warehousing and OLAP
technology. ACM SIGMOD Record, 26:6574, 1997
7.
C. Imhoff, N. Galemmo, and J. G.
Geiger. Mastering Data Warehouse Design:
Relational and Dimensional Techniques.
John Wiley, 2003
8.
W. H. Inmon. Building the Data
Warehouse. John Wiley, 1996
9.
R. Kimball and M. Ross. The
Data Warehouse Toolkit: The Complete
Guide to Dimensional Modeling. 2ed. John
Wiley, 2002
The dice operation defines a subcube by
performing a selection on two or more dimensions.A
dice operation on the central cube based on the
following selection criteria that involve three
dimensions:
(college = “kpr”) and (year =2011”)
and
(course=”ECE”.). Pivot (rotate): Pivot (also called
rotate) is a visualization operation that rotates the
data axes in view in order to provide an alternative
presentation of the data. DMQL provides the
following primitives for traversing different level of
abstraction
<multi-level-manipulation> ::= upon<attribte name>
|
Downon<attributename>
| add <attributename>
| drop <attributename>
VI. CONCLUSION
This paper discusses the design
principles and methods for the multidimensional data
cube for single window counseling system. The
model facilitate user to handle the data cube instead
of relational data base. The Query methods has also
been discussed to implement the cube and view the
data in multiple views. Multidimensional data model
support OLAP tool to view the data. The paper
generalize the operations of OLAP, so that the data
can be viewed in all directions.
48