Download an investigation and evaluation on précised decision for scientific

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Nonlinear dimensionality reduction wikipedia , lookup

Transcript
INTERNATIONAL JOURNAL OF COMPUTER APPLICATIONS IN
ENGINEERING, TECHNOLOGY AND SCIENCES (IJ-CA-ETS)
AN INVESTIGATION AND EVALUATION ON PRÉCISED
DECISION FOR SCIENTIFIC DATA USING NEW APPROACHES
IN DMDW
1
DR.SUDARSON JENA, 2SANTHOSH PASULADI, 3KARTHIK KOVURI,4G.L.ANAND BABU
1
Gitam University, Rudraram, Hyderabad, A.P, INDIA,
S.V.College of Engineering and Technology, Moinabad (M), R.R.Dist, A.P, INDIA,
3
Trinity College of Engineering & Technology, Peddapally (M), Karimnagar Dist-505172,
4
CVSR College of Engineering, Venkatapur (v), Ghatkesar, R.R Dist-501301.
2
[email protected], [email protected], [email protected]
and [email protected]
ABSTRACT. This paper provides an overview of scientific data warehousing and on-line analytical processing
(OLAP) technologies, with an emphasis on their data warehousing requirements. The methods that we used
include the efficient computation of data cubes by integration of MOLAP and ROLAP techniques, the
integration of data cube methods with dimension relevance analysis and data dispersion analysis for concept
description and data cube based multi-level association, classification, prediction and clustering techniques.
Keywords: Scientific Data Warehouses, Olap, Data Mining, On-Line Analytical Mining (Olam), Dbm, Data
Cubes.
I. Introduction
Nowadays, we find ourselves in the decade
dominated by the expansion of multimedia data. The
growing interest concerning the storage and
knowledge discovery in data in heterogeneous forms
(text, images, video, relational views, etc.), that we
shall call complex data, animates research
communities including new architectures and
processing tools which are better adapted. Complex
data, more than being heterogeneous, encloses
several classic data. For instance, an image can be
described by several characteristics/descriptors that
constitute data to be analyzed. Then how can we
represent these data? Complex data warehousing
needs innovation in its phases in order to answer
this question. The ETL phase needs to be adapted to
take into account the specialty of complex data.
Furthermore, the multidimensional modeling is not
obvious. It needs to consider all possible information
concerning the complex data. For example, some of
this information can be determined through data
mining techniques. In this context, we propose a
new approach for the complex data warehousing
process, focusing on the data integration and
multidimensional modeling phases.
Research and Development produce a very large
amount of Scientific and Technical data. The analysis
and interpretation of these data is crucial for
the proper understanding of Scientific / Technical
phenomena and discovery of new concepts. Data
warehousing and on-line analytical
processing
(OLAP) are essential elements of decision support,
which has increasingly become a focus of the
database industry. Many commercial products and
services are now available, and all of the
principal database management system vendors
now have offerings in these areas. Decision
support
places
some
rather different
requirements on database technology compared to
traditional
on-line
transaction
processing
applications. Data Warehousing (DW) and On-Line.
Analytical Processing (OLAP) systems based on
a dimensional view of data are being used
increasingly in traditional business applications as
well as in applications such as health care and biochemistry for the purpose of analyzing very large
amounts of data. The use of DW and OLAP systems
for scientific purposes raises
several new
challenges to
the
traditional
technology.
Efficient implementation and fast response is the
major challenge in the realization of On-line
analytical mining in large databases and scientific
data warehouses. Therefore, the study has been
focused on the efficient implementation of the Online analytical mining mechanism. The methods that
I used include the efficient computation of data
cubes by integration of MOLAP and ROLAP
techniques, the integration of data cube methods
ISSN: 0974-3596 | APRIL 2012- SEPTEMBER 2012 | Volume 4 : Issue 2| Page: 62
INTERNATIONAL JOURNAL OF COMPUTER APPLICATIONS IN
ENGINEERING, TECHNOLOGY AND SCIENCES (IJ-CA-ETS)
with dimension relevance analysis and data
dispersion analysis for concept description and
data
cube
based multi-level
association,
classification,
prediction
and clustering
techniques
II Data Mining Methods
A. Statistics
There are several statistical methods used in data
mining projects that are widely used in science and
industry and provide excellent features for
describing and visualizing large chunks of data. Some
of the methods commonly used are regression
analysis, correlation, Chaid analysis, hypothesis
testing, and discriminate analysis.
Pros
Statistical analysis is sometimes a good ‘first step’ in
understanding data. These methods deal well with
numerical data where the underlying probability
distributions of the data are known. They are not as
good with nominal data such as “good”, better”,
“best” or “Europe”, “North America”, “Asia” or
“South America”.
Cons
Statistical methods require statistical expertise, or a
project person well versed in statistics who is heavily
involved. Such methods require difficult to verify
statistical assumptions and do not deal well with
non-numerical data. They suffer from the “black box
aversion syndrome”. This means that that nontechnical decision makers, those who will either
accept or reject the results of the study, are often
unwilling to make important decisions based on a
technology that gives them answers but does not
explain how it got the answers. To tell a nonstatistician CEO that she or he must make a crucial
business decision because of a favorable R statistic is
not usually well received. With Nuggets® you can be
told exactly how the conclusion was arrived at.
Another problem is that statistical methods are valid
only if certain assumptions about the data are met.
Some of these assumptions are: linear relationships
between pairs of variables, non-multicollinearity,
and normal probability distributions, independence
of samples.
B. Neural Nets
This is a popular technology, particularly in the
financial community. This method was originally
developed in the 1940’s to model biological nervous
systems in an attempt to mimic thought processes.
Pros
The end result of a Neural Net project is a
mathematical model of the process. It deals
primarily with numerical attributes but not as well
with nominal data.
Cons
granularities, and present knowledge/results in
different
forms. On-line analytical
mining
There is still much controversy regarding the efficacy
of Neural Nets. One major objection to the method
is that the development of a Neural Net model is
partly an art and partly a science in that the results
often depend on the individual who built the model.
That is, the model form (called the network
topology) and hence the results, may differ from one
researcher to another for the same data. There is
the problem that often occurs of “over fitting” that
results in good prediction of the data used to build
the model but bad results with new data. The “black
box syndrome” also applies here.
C. Decision Trees
Decision tree methods are techniques for
partitioning a training file into a tree representation.
The starting node is called the root node. Depending
upon the results of a test this node is then
partitioned into two or more sub-sets. Each node is
then further partitioned until a tree is built. This tree
can be mapped into a set of rules.
Pros
Fairly fast and results can be presented as rules.
Cons
By far the most important negative for decision trees
is that they are forced to make decisions along the
way based on limited information that implicitly
leaves out of consideration the vast majority of
potential rules in the training file. This approach may
leave valuable rules undiscovered since decisions
made early in the process will preclude some good
rules from being discovered later.
III. OLAP+ Data Mining On-Line Analytical Mining
On-line
analytical processing
(OLAP)
is
a
powerful data analysis method for multidimensional analysis of data warehouses.
Motivated by the popularity of OLAP technology, I
use an On-Line Analytical Mining (OLAM) mechanism
for multi-dimensional data mining in large databases
and scientific data warehouses. I believe this is
a promising direction to pursue for the scientific
data warehouses, based on the following
observations.
1. Most data mining tools need to work on
integrated, consistent, and cleaned data, which
requires costly data
cleaning, data transformation and data integration
as pre-?
Processing steps. A data warehouse
constructed by such pre-processing serves as a
Valuable source of cleaned and integrated data for
OLAP as well as for data mining.
2. Effective data mining needs exploratory data
analysis. A user often likes to traverse
flexibly
through a database, select any portions of
relevant
data,
analyze
data
at different
exploratory data mining.
3. It is facilities
often difficult
a user
to predict
what of data and at dif
provides
for datafor
mining
on different
subsets
ISSN: 0974-3596 | APRIL 2012- SEPTEMBER 2012 | Volume 4 : Issue 2| Page: 63
INTERNATIONAL JOURNAL OF COMPUTER APPLICATIONS IN
ENGINEERING, TECHNOLOGY AND SCIENCES (IJ-CA-ETS)
kinds of knowledge to be mined beforehand, by
integration of OLAP with multiple data mining
functions.
On-line analytical mining provides
flexibility for users to select desired data mining
functions and swap data mining tasks dynamically.
However, data mining functions usually cost more
than
simple
OLAP
operations.
Efficient
implementation and fast response is the major
challenge in the realization of On- line analytical
mining in large databases and scientific data
warehouses. Therefore, our study has been
focused on the efficient implementation of
the On-line analytical mining mechanism. The
methods that I used include the efficient
computation of data cubes by integration of
MOLAP and ROLAP techniques, the integration of
data cube methods with dimension relevance
analysis and data dispersion analysis for concept
description and data cube based multi- level
association, classification, prediction and clustering
techniques. These methods will be discussed in detail in
the following subsections.
User GUI API
OLAM
Engine
OLAP
Engine
API
Meta Data
Data cleaning
Data integration
Data Cube
Data Base
Filtering
Data Warehouse
Figure.1. An integrated OLAM and OLAP Architecture
time-series analysis, etc. Therefore, an OLAM
A. Architecture for On-Line Analytical Mining:
An OLAM engine performs analytical mining in data
engine is more sophisticated than an OLAP engine
cubes in a similar manner as an OLAP engine
since it usually consists of multiple mining modules
which may interact with each other for effective
performs on- line analytical processing.
Therefore, it is suggested to have an integrated
mining in a scientific data warehouse. Since some
OLAM and OLAP architecture as shown in below
requirements in OLAM, such as the construction
of numerical dimensions, may not be readily
Figure.1., where the OLAM and OLAP engines both
available in the commercial OLAP products, I have
accept users on-line queries (instructions) and work
chosen to construct our own data cube and
with the data cube in the analysis Furthermore,
an OLAM engine may perform multiple data
build the mining modules on such data cubes.
With many OLAP products available on the market, it
mining tasks, such as concept description,
is important to develop
association, classification, prediction, clustering,
OLAP analyses. Since OLAM engines are constructed
on-line analytical mining mechanisms directly on
either on customized data cubes which often
top of the constructed data cubes and OLAP
work with relational database systems, or on top
engines. Based on our analysis, there is no
of the data cubes provided by the OLAP
fundamental difference between the data cube
products, it is suggested to build online analytical
required for OLAP and that for OLAM, although
mining systems on top of the existing OLAP and
OLAM analysis may often involve the analysis of a
relational database systems rather than from the
larger number
of
dimensions
with
finer
ground up.
granularities, and thus require
more powerful
data cube construction and accessing tools than
ISSN: 0974-3596 | APRIL 2012- SEPTEMBER 2012 | Volume 4 : Issue 2| Page: 64
INTERNATIONAL JOURNAL OF COMPUTER APPLICATIONS IN
ENGINEERING, TECHNOLOGY AND SCIENCES (IJ-CA-ETS)
B. Data cube construction
Data cube technology is essential for efficient online analytical mining. There have been many
studies on efficient computation and access of
multidimensional databases. These lead us to use data
cubes for scientific data warehouses. The attributeoriented
induction
method
adopts
two
generalization techniques (1) attribute removal,
which removes attributes which represent lowlevel data in a hierarchy, and (2) attribute
generalization which generalizes
attribute
values to their corresponding high level ones.
Such generalization leads to a new, compressed
generalized r e l a t i o n w i t h c o u n t a n d /or o t h e r
a g g r e g a t e values accumulated. This is similar to
the relational OLAP (ROLAP) implementation of the
roll-up operation. For fast response in OLAP and
data mining, the later implementation has adopted
data cube technology as follows, when data cube
contains a small number of dimensions, or when it is
generalized to a high level, the cube is structured as
compressed sparse array but is still stored in a
relational database (to reduce the cost of
construction and indexing of different data
structures). The cube is pre-computed using a chunkbased multi-way array aggregation technique.
However, when the cube has a large number of
dimensions, it becomes very sparse with a huge
number of chunks. In this case, a relational structure
is adopted to store and compute the data cube,
similar to the ROLAP implementation. We believe
such a dual data structure technique represents a
balance between multidimensional OLAP (MOLAP)
and relational OLAP (ROLAP) implementations. It
ensures fast response time when handling mediumsized cubes/cuboids and high scalability when
handling large databases with high dimensionality.
Notice that even adopting the ROLAP technique, it is
still unrealistic to materialize all the possible cuboids
for large databases with high dimensionality due to
the huge number of cuboids it is wise to materialize
more of the generalized, low dimensionality cuboids
besides considering other factors, such as accessing
patterns and the sharing among different cuboids. A
3-D data cube/cuboids can be selected from a highdimensional data cube and be browsed conveniently
using the DBMiner 3-D cube browser as shown in
Figure.2. Where the size of a cell (displayed as a tiny
cube) represents the entry count in the
corresponding cell, and the brightness of the cell
represents another measure of the cell. Pivoting,
drilling, and slicing/dicing operations can be
performed on the data cube browser with mouse
clicking.
III OLAP++ System Architecture:
The overall architecture of the OLAP++ system is
seen in Figure.4. The object part of the system is
based on the OPM tools that implements the Object
Data Management Group (ODMG) object data
model and the Object Query Language (OQL) on top
of a relational DBMS, in this case the ORACLE
RDBMS. The OLAP part of the system is based on
Microsoft’s SQL Server OLAP Services using the
Multi- Dimensional expressions (MDX) query
language.
When a SumQL++ query is received by the
Federation Coordinator (FC), it is first parsed to
identify the measures, categories, links, classes and
attributes referenced in the query. Based on this, the
FC then queries the metadata to get information
about which databases the object data and the OLAP
data reside in and which categories are linked to
which classes. Based on the object parts of the
query, the FC then sends OQL queries to the object
databases to retrieve the data for which the
particular conditions holds true. This data is then put
into a “pure” SumQL statement (i.e. without object
references) as a list of category values.
This SumQL statement is then sent to the OLAP
database layer to retrieve the desired measures,
grouped by the requested categories. The SumQL
statement is translated into
MDX by a separate layer, the “SumQL-to-MDX
translator”, and the data returned from OLAP
Services is returned to the FC. The reason for using
the intermediate SumQL statements is to isolate the
implementation of the
OLAP data from the FC. As another alternative, we
have also implemented a translator into SQL
statements against a “star schema” relational
database design. The system is able to support a
good query performance even for large databases
while making it possible to integrate existing OLAP
data with external data in object databases in a
flexible way that can adapt quickly to changing query
needs.
A. Back End Tools and Utilities:
Data warehousing systems use a variety of data
extraction and cleaning tools, and load and refresh
utilities for populating warehouses Data extraction
from “foreign” sources is usually implemented via
gateways and standard interfaces (such as
Information Builders EDA/SQL, ODBC, Oracle Open
Connect, Sybase Enterprise Connect, Informix
Enterprise Gateway).
ISSN: 0974-3596 | APRIL 2012- SEPTEMBER 2012 | Volume 4 : Issue 2| Page: 65
INTERNATIONAL JOURNAL OF COMPUTER APPLICATIONS IN
ENGINEERING, TECHNOLOGY AND SCIENCES (IJ-CA-ETS)
Graphical User
Interface
SumQL++
Federation Coordinator
QQL
SumQL
QQL
QQL-to-SQL
Translator
SQL
ODB Data
(oracle)
Link
Meta- data
QQL-to-SQL
Translator
SQL
SumQL-to-MDX
Translator
MDX
Link Data
(oracle)
SDB Data
(MS OLAP)
suspicious pattern (based on statistical analysis) that
Figure.2 OLAP++ Architecture
a certain car dealer has never received any
complaints.Load: After extracting, cleaning and
Data Cleaning: Since a data warehouse is used for
decision making, it is important that the data in the
warehouse be correct. However, since large volumes
transforming, data must be loaded into the
of data from multiple sources are involved, there is a
warehouse. Additional preprocessing may still be
high probability of errors and anomalies in the data.
required: checking integrity constraints; sorting;
Therefore, tools that help to detect data anomalies
summarization, aggregation and other computation
and correct them can have a high payoff. Some
to build the derived tables stored in the warehouse;
examples where data cleaning becomes necessary
building indices and other access paths; and
are: inconsistent field lengths, inconsistent
partitioning to multiple target storage areas. The
descriptions, inconsistent value assignments, missing
load utilities for data warehouses have to deal with
entries and violation of integrity constraints. Not
much larger data volumes than for operational
surprisingly, optional fields in data entry forms are
databases. There is only a small time window
significant sources of inconsistent data. There are
(usually at night) when the warehouse can be taken
three related, but somewhat different, classes of
offline to refresh it. Sequential loads can take a very
data cleaning tools. Data migration tools allow
long time, e.g., loading a terabyte of data can take
simple transformation rules to be specified; e.g.,
weeks and months! Hence, pipelined and partitioned
“replace the string gender by sex”. Warehouse
parallelisms are typically exploited 6. Doing a full
Manager from Prism is an example of a popular tool
load has the advantage that it can be treated as a
of this kind. Data scrubbing tools use domainlong batch transaction that builds up a new
specific knowledge (e.g., postal addresses) to do the
scrubbing of data. They often exploit parsing and
database. While it is in progress, the current
database can still support queries; when the load
fuzzy matching techniques to accomplish cleaning
transaction commits, the current database is
from multiple sources. Some tools make it possible
replaced with the new one. Using periodic
to specify the “relative cleanliness” of sources. Tools
checkpoints ensures that if a failure occurs during
such as Integrity and Trillum fall in this category.
the load, the process can restart from the last
Data auditing tools make it possible to discover rules
checkpoint. However, even using parallelism, a full
and relationships (or to signal violation of stated
load may still take too long. Most commercial
rules) by scanning data. Thus, such tools may be
utilities (e.g., RedBrick Table Management Utility)
considered variants of data mining tools. For
example, such a tool may discover a
use incremental loading during refresh to reduce the
volume of data that has to be incorporated into the
warehouse. Only the updated tuples are inserted.
ISSN: 0974-3596 | APRIL 2012- SEPTEMBER 2012 | Volume 4 : Issue 2| Page: 66
INTERNATIONAL JOURNAL OF COMPUTER APPLICATIONS IN
ENGINEERING, TECHNOLOGY AND SCIENCES (IJ-CA-ETS)
However, the load process now is harder to manage.
The incremental load conflicts with ongoing queries,
so it is treated as a sequence of shorter transactions
(which commit periodically, e.g., after every 1000
records or every few seconds), but now this
sequence of transactions has to be
coordinated to ensure consistency of derived data
and indices with the base data.
(B) Conceptual Model and Front End Tools:
A popular conceptual model that influences the
front-end tools, database design, and the query
engines for OLAP is the multidimensional view of
data in the warehouse. In a multidimensional data
model, there is a set of numeric measures that are
the objects of analysis. Examples of such measures
are sales, budget, revenue, inventory, ROI (return on
investment). Each of the numeric measures depends
on a set of dimensions, which provide the context for
the measure.
For example, the dimensions associated with a sale
amount can be the city, product name, and the date
when the sale was made. The dimensions together
are assumed to uniquely determine the measure.
Thus, the multidimensional data views a measure as
a value in the
multidimensional space of dimensions. Each
dimension is described by a set of attributes. For
example, the Product dimension may consist of four
attributes: the category and the industry of the
product, year of its introduction, and the average
profit margin. For example, the soda Surge belongs
to the category beverage and the food industry, was
introduced in 1996, and may have an average profit
margin of 80%. The attributes of a dimension may be
related via a hierarchy of relationships. In the above
example, the product name is related to its category
and the industry attribute through such a
hierarchical relationship.
(C) Front End Tools
The multidimensional data model grew out of the
view of business data popularized by PC spreadsheet
programs that were extensively used by business
analysts. The spreadsheet is still the most compelling
front-end application for OLAP. The challenge in
supporting a query environment for OLAP can be
crudely summarized as that of supporting
spreadsheet operations efficiently over large multigigabyte databases. Indeed, the Essbase product of
Arbor Corporation uses Microsoft Excel as the frontend tool for its multidimensional engine.
IV Advantages:
On-line analytical processing (OLAP) is a powerful
data analysis method for multi-dimensional analysis
of data warehouses. OLAM engine may perform
multiple data mining tasks, such as concept
description, association, classification, prediction,
clustering, time series analysis, etc. Therefore, an
OLAM engine is more sophisticated than an OLAP
engine since it usually consists of multiple mining
modules which may interact with each other for
effective mining. Based on our analysis, there is no
fundamental difference between the data cube
required for OLAP and that for OLAM, although
OLAM analysis may often involve the analysis of a
larger number of dimensions with finer granularities,
and thus require more powerful data cube
construction and accessing tools than OLAP analyses.
The attribute-oriented induction method adopts two
generalization techniques (1) attribute removal,
which removes attributes which represent low-level
data in a hierarchy, and (2) attribute generalization
which generalizes attribute values to their
corresponding high level ones. Such generalization
leads to a new, compressed generalized relation
with count and/or other aggregate values
accumulated. Data warehousing systems use a
variety of data extraction
and cleaning tools, and load and refresh utilities for
populating warehouses Data extraction from
“foreign” sources is usually implemented via
gateways and standard
interfaces. Data Cleaning, Load, Refresh and
After_row operations can be performed more
efficiently. Data cleaning is a problem that is
reminiscent of heterogeneous data integration, a
problem that has been studied for many years. But
here the emphasis is on data inconsistencies instead
of schema inconsistencies. Data cleaning, as I
indicated, is also closely related to data mining, with
the objective of suggesting possible inconsistencies.
This architecture gives user the multidimensional
view of data and can provide easy Drill Down, rotate
and ad-hoc analysis of data. It can also support
iterative discovery process. It can provide unique
descriptions across all levels of data. The OLAP in
this type architecture can empower end user to do
own scientific analysis, can give ease of use. This also
provides easy Drill Down facility to the users. This
architecture can provide virtually no knowledge of
tables required for the users. This architecture can
also improve exception analysis and variance
analysis. Provides high query performance and keeps
local processing at sources unaffected and can
operate when sources unavailable. Can query data
not stored in a DBMS through Extra information at
warehouse The use of DW and OLAP systems for
scientific purposes raises several new challenges to
the traditional technology. The methods that I used
include the efficient computation of data cubes by
integration of MOLAP and ROLAP techniques, the
integration of data cube methods with dimension
relevance analysis and data dispersion analysis for
concept description and data cube based multi-level
association, classification, prediction and clustering
ISSN: 0974-3596 | APRIL 2012- SEPTEMBER 2012 | Volume 4 : Issue 2| Page: 67
INTERNATIONAL JOURNAL OF COMPUTER APPLICATIONS IN
ENGINEERING, TECHNOLOGY AND SCIENCES (IJ-CA-ETS)
techniques. I describe back end tools for extracting,
cleaning and loading data into a scientific data
warehouse; multidimensional data models typical of
OLAP; front end client tools for querying and data
analysis and tools for metadata management and for
managing the warehouse.
CONCLUSION:
Data warehouses for scientific purposes pose several
great challenges to existing data warehouse
technology. This paper provides an overview of
scientific data warehousing and OLAP technologies,
with an emphasis on their data warehousing
requirements.
Data
warehousing
using
multidimensional view and on-line analytical
processing have become very popular in both
business and science in recent years and are
essential elements of decision support, analysis and
interpretation of data. The methods, that used
include the efficient computation of data cubes by
integration of MOLAP and ROLAP techniques, the
integration of data cube methods with dimension
relevance analysis and data dispersion analysis for
concept description and data cube based multi-level
association, classification, prediction and clustering
techniques. Here we describe back end tools for
extracting, cleaning and loading data into a scientific
data warehouse, multidimensional data models
typical of OLAP.
ACKNOWLEDGMENT
Part of the work presented here is resulted from
work done by research scholar of JNTU,Hyderabad.
Special thanks toDr.Madhan Kumar Srinivas
,member
in
Research
and
Development,Infosys,Mysore for his interest in this
work. Thanks to shri manohar reddy, ,chairman of
TRINITY Group of Institutions for their constant
encouragement in preparation of this work. Special
Thanksto,shri,Rajeshwar Reddy,,Chairman, CVSR
College of Engineering, Venkatapur (v), Ghatkesar ,
deserves acknowledgment for their
sufficient
funding and according for extenstion of existing
facilities for execution of the problem proposed.
REFERENCES:
[1].
Microsoft Corporation. OLE DB for OLAP
Version 1.0 Specification. Microsoft Technical
Document, 1998.
[2].
The OLAP Report. Database Explosion.
www.olapreport.com/DatabaseExplosion.htm>.
February 18, 2000.
[3].
T. B. Pedersen and C. S. Jensen. Research
Issues in Clinical Data Warehousing. In Proceedings
of the Tenth International Conference on Statistical
and Scientific Database Management, pp. 43–52,
1998.
[4].
T. B. Pedersen, C. S. Jensen, and C. E.
Dyreson.
Supporting
Imprecision
in
Multidimensional Databases Using Granularities. In
Proceedings of the Eleventh International
Conference on Statistical and Scientific Database
Management, B, 1995, pp 90-10, 1999
[5].
T. B. Pedersen, C. S. Jensen, and C. E.
Dyreson. Extending PractiPre-Aggregation in On-Line
Analytical Processing. In
Proceedings of the
Twentyfifth International Conference on Very Large
Data Bases, pp. 663–674, 1999.
[6].
T. B. Pedersen and C. S. Jensen.
Multidimensional Data Modeling for Complex Data.
In Proceedings of the Fifteenth International
Conference on Data Engineering, 1999. Extended
version available as TimeCenter Technical Report TR37,
[7].
http://www.olapcouncil.org
[8].
Codd, E.F., S.B. Codd, C.T. Salley, “Providing
OLAP (On-LineAnalytical Processing) to User Analyst:
An IT
[9].
Mandate.”Available from Arbor Software’s
web site http://www.arborsoft.com/OLAP.html.
[10].
Kimball, R. The Data Warehouse Toolkit.
John Wiley, 1996.
[11].
Barclay, T., R. Barnes, J. Gray, P.
Sundaresan, “Loading Databases using Dataflow
Parallelism.” SIGMOD Record, Vol.
[12].
23, No. 4, Dec.1994.
[13].
O’Neil P., Quass D. “Improved Query
Performance with Variant Indices”, To appear in
Proc. of SIGMOD Conf., 1997.
[14].
Harinarayan V., Rajaraman A., Ullman J.D.
“Implementing Data Cubes Efficiently” Proc. of
SIGMOD Conf., 1996.
[15].
Chaudhuri S., Krishnamurthy R., Potamianos
S., Shim K.“Optimizing Queries with Materialized
Views” Intl. Conference on Data Engineering, 1995.
[16].
Widom, J. “Research Problems in Data
Warehousing.” Proc. 4th Intl. CIKM Conf., 1995.
[17].
R. G. G. Cattell et al. (Eds). The Object
Database Standard: ODMG 2.0. Morgan Kaufmann,
1997.
[18].
E. Thomsen. OLAP Solutions.Wiley, 1997.
[19].
17 R. Winter. Databases: Back in the OLAP
game. Intelligent Enterprise Magazine, 1(4):60–64,
1998.
[20].
Wu, M-C., A.P. Buchmann. “Research Issues
in Data Warehousing.” Submitted for publication.
[21].
Levy A., Mendelzon A., Sagiv Y. “Answering
Queries Using Views” Proc. of PODS, 1995.
[22].
Seshadri P., Pirahesh H., Leung T. “Complex
Query Decorrelation” Intl. Conference on Data
Engineering, 1996.
[23].
21. Widom, J. “Research Problems in Data
Warehousing.” Proc. 4th Intl. CIKM Conf.,
1995.Gupta A., Harinarayan V., Quass D. “AggregateQuery
Processing
in
Data
Warehouse
Environments”, Proc. of VLD
ISSN: 0974-3596 | APRIL 2012- SEPTEMBER 2012 | Volume 4 : Issue 2| Page: 68