Download Cubes by design

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Data Protection Act, 2012 wikipedia , lookup

Database wikipedia , lookup

Versant Object Database wikipedia , lookup

Data center wikipedia , lookup

Expense and cost recovery system (ECRS) wikipedia , lookup

Data model wikipedia , lookup

Entity–attribute–value model wikipedia , lookup

Data analysis wikipedia , lookup

Clusterpoint wikipedia , lookup

3D optical data storage wikipedia , lookup

Information privacy law wikipedia , lookup

Relational model wikipedia , lookup

Data vault modeling wikipedia , lookup

Business intelligence wikipedia , lookup

Database model wikipedia , lookup

Transcript
TECH2TECH APPLIED SOLUTIONS #1
Cubes by design
ROLAP and HOLAP solutions using SAS and Teradata.
O
ne of the best-kept secrets in SAS®’s
business intelligence (BI) offering
is its relational implementation of online
analytical processing (OLAP). SAS is well
known for its powerful analytics that typically requires data to be housed in a separate SAS data set or, in the case of OLAP,
inside a cube. SAS also provides the option
to implement a relational OLAP (ROLAP),
where the data stays in the data warehouse.
The benefits can be viewed by examining
the three most common OLAP techniques:
multi-dimensional OLAP (MOLAP), hybrid
OLAP (HOLAP) and ROLAP. (See figure 1.)
In the first method, MOLAP, data is
extracted from the data warehouse and
aggregated into a data structure, commonly
referred to as a cube, for analysis. Since
the data is pre-aggregated, the response is
quickly returned to the end users. The cost
of this technique relates to the overhead
of the tasks the BI administrators must
perform. First, the data resides in both the
warehouse and the cube. This means that
the data must be updated and maintained
in two locations. Second, since the data
must go through an aggregation process
when a MOLAP cube is built or updated,
additional overhead is incurred. As businesses expand their analysis to include more
dimensions or deeper levels of analysis, the
cost and overhead of moving and replicating data into an external cube becomes a
challenge for the BI administrator and IT.
The next technique, HOLAP, addresses
some of the challenges of the MOLAP
implementation. HOLAP is a hybrid
by Michelle Wilkie and Arlene Zaima
approach in which higher-level aggregations that are commonly accessed are stored
on a server and the more granular information is stored in the data warehouse. This
technique was developed to enable larger
definitions of cubes without affecting the
cube build time. A cube designer can add
details or dimensionality into the cube
without increasing the overhead cost of
the MOLAP cube. This provides the BI
administrator the flexibility to establish
the location of the multi-dimensional data
depending on access frequency, administration and processing overheads.
Figure 1
In the third OLAP option, ROLAP, the
data stays in the data warehouse and only
the metadata is stored outside the database.
Each request is converted to SQL and sent
to the data warehouse, where the results are
retrieved and returned to the analyst. The
ROLAP solution resolves the overhead problem of maintaining data in multiple locations,
as well as the additional processing involved
in building or updating the cube; however, if
the ROLAP is not designed and implemented
properly, the response time may be slow.
A SAS ROLAP and HOLAP implementation can be optimally built with
OLAP techniques
The Teradata system can optimize online analytical processing (OLAP) implementations regardless of which technique is used.
PAGE 1 | Teradata Magazine | September 2008 | ©2008 Teradata Corporation | AR-5728
a Teradata Database to maximize performance. OLAP query processes are
optimized through aggregate join indexes
(AJIs), a Teradata Database feature. An
AJI is a join index that specifies SUM or
COUNTS aggregate operations across one
or more tables. AJIs require no user or BI
administrator maintenance and are used
automatically by the Teradata Optimizer to
improve ROLAP requests.
SAS cubes
MOLAP, ROLAP and HOLAP cubes
are all supported in SAS 9.1.3. A cube
designer can define a cube using SAS
OLAP Cube Studio (an easy-to-use user
interface) or using code (PROC OLAP) that
is then built by the SAS Workspace Server.
A SAS cube comprises three parts:
metadata, navigation files and the physical
data or aggregation tables. (See figure 2.)
The first two components do not differ
with the different OLAP techniques.
The metadata for a cube created in
the SAS Enterprise Intelligence Platform
defines information such as location of
data, cube structure, cube-based security permissions and calculated measure
definitions. The navigation files are used
to help understand how information of
the input data translates to the structure
of the cube; for example, how members
relate to each other and formats, member
properties and captions for each member.
The physical data is dependent
on which OLAP technique the cube
designer specifies when building the
cube structure:
> MoLap. Relevant SAS proprietary
highly indexed aggregation tables are
created and stored within the physical cube.
> RoLap. All data resides in the relational database management system
(RDBMS) where relational tables
are optimized for low-level dimensional requests, and aggregate indexes
are created for higher-level OLAP
requests. The Teradata
system will determine the
optimal database structure
to use.
> HoLap. A mix of the
SAS proprietary aggregation table and relational
tables will be used. This is
typically dependent on the
granularity and cardinalities that are present within
the cubes.
Figure 2
SAS cube components
SAS OLAP Server
The SAS OLAP Server has a
dual role:
> Security validation
• Authentication of the
The metadata and navigation files components do not
user against the SAS
differ regardless of online analytical processing (OLAP)
Metadata Server
techniques. However, aggregations define the physical
data structure, which is unique to each technique.
• Authorization and validation of what the user is
allowed to see
tune a cube based on end-user interac> Query engine
tions or queries that have been submitted
• Handles the multi-dimensional
against that cube.
expressions (MDXs) passed from
SAS BI clients
Teradata Database
• Retrieves the relevant data that
considerations
answers the MDX query
SAS OLAP cubes support three types of
• Sends that data back to the clients
input data: star schema, detail data or
Which OLAP technique the cube is
summarized tables. Star schema input
based on will determine how the MDX
data sources will typically give SAS OLAP
query is handled and translated into the
the best build performance; however,
appropriate query that will be passed either
the physical database design of any data
to an underlying database or internally. In
warehouse should reflect the customer’s
the MOLAP-based cube, the SAS OLAP
business, independent of any tool or appliServer spawns multiple threads internally
cation requirements.
to retrieve the queries from the relevant
Teradata recommends an applicationcube aggregation tables. For a ROLAPagnostic data model such as third normal
based cube, MDX is translated into SQL
form, adhering to the best practices and
queries, which are passed down to the
methodology that provide an enterprise
RDBMS to handle and optimize.
view of the business. To implement SAS
For the most optimized performance
ROLAP on a normalized data model or
at query time, a SAS cube requires
snowflake schema, a semantic layer must
aggregation tables that best meet the
be built on top of the table to represent
query result set. SAS provides application
a star.
response measurement (ARM) logs that
The optimal solution is building
help cube designers or administrators
aggregates on top of a normalized model
PAGE 2 | Teradata Magazine | September 2008 | ©2008 Teradata Corporation | AR-5728
TECH2TECH APPLIED SOLUTIONS #1
or snowflake schema and using view
semantic layer to represent a star. It is
crucial that the normalized model or
snowflake schema be “cleansed,” meaning
there are no NULLs, data transformations are complete and data is ready for
reporting. If not, then it may be necessary to build a physical semantic layer,
as the Teradata aggregate approach
described above will not work on
“uncleansed” normalized data.
If a physical semantic layer is required,
it is recommended to implement a
snowflake schema that is populated by
INSERT/SELECTs from the normalized
model in the Teradata Database. The
INSERT/SELECTs would be defined so
that they perform the data-cleansing
tasks to result in a snowflake schema that
is ready for reporting. Views would then
be created to present the star schema for
SAS. Teradata AJIs can then be built on
top of the snowflake schema to increase
OLAP performance.
A ROLAP solution
The advantages of a ROLAP solution include:
> Only metadata and navigation files are
created, resulting in fast build times.
> Data management remains within the
RDBMS, not within the cube.
A ROLAP-based cube lets the RDBMS
handle the SQL and optimization, which is
dependent on implementing an AJI feature.
This is the preferred method.
When defining the SAS cube structure,
the cube designer needs to be aware of the
second-to-last window in the SAS Cube
Designer Wizard where the box Do not
create an NWAY must be checked. This fully
summarized table, composed of all crossing
of the levels defined in the cube, is equivalent to the PROC OLAP option NO_NWAY.
Better, faster analysis
SAS and Teradata naturally complement
each other with a powerful and flexible
solution for BI administrators. When organizations consider an OLAP technique, the
PAGE 3 | Teradata Magazine | September 2008 | ©2008 Teradata Corporation | AR-5728
choices are extended with an accelerated SAS
ROLAP solution with the Teradata Database.
This combined solution enables businesses to analyze data at the “speed of
thought” with the breadth and depth of
analysis only provided by an integrated solution. Now you can make the ideal choice to
best meet your expanding business needs. T
Michelle Wilkie, a product manager with SAS
Institute Inc., supports research and development teams developing OLAP products.
Arlene Zaima, a strategic intelligence program
manager at Teradata, has more than 10 years
of experience in advanced analytics.
T
Online
For more information, read
“An added dimension” on
TeradataMagazine.com and
visit support.sas.com for
more information on the SAS
OLAP Server.