Download Data Models (cont…)

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Relational model wikipedia , lookup

Big data wikipedia , lookup

Entity–attribute–value model wikipedia , lookup

Functional Database Model wikipedia , lookup

Database model wikipedia , lookup

Transcript
Course Name: Business Intelligence
Year
: 2009
Data Warehouses, Online Analytical
Processing, and Metadata
11th Meeting
Source of this Material
(2).
Loshin, David (2003). Business Intelligence:
The Savvy Manager’s Guide. Chapter 6
Bina Nusantara University
3
The Business Case
There is a significant difference between the traditional use of databases for
business purpose and the us of databases for analytical purposes. The
traditional use revolves around transaction processing as the means by which a
business’s operation is modeled. On the other hand, the representation of
information in this framework is not suitable for analytical purposes.
BI community has developed a different kind of data model that more efficiently
represents data that is to drive analytic applications and decision support,
called a dimensional model. By creating a centralized data repository using
this kind of data model and aggregating data sets from all areas of the
corporate enterprise in this repository, a data warehouse can be created that
can supply data to the individual analytic applications.
Bina Nusantara University
4
Data Models
A data model is a discrete structured data representation of a real-world set of
entities related to one another. There is a significant difference between how
we use data in an operational/tactical manner (i.e., to “run the business”) and
the ways we use data in a strategic manner (i.e., “improve the business”). The
traditional modeling technique for operational systems revolves around the
entity-relationship model.
• Entity-Relationship Models
Relational database, in which the way that information was modeled was viewed in
the context of representing entities within separate tables and relating those entities
within a business process context between tables using some form of cross-table
linkage. One essential goal of the entity-relationship model is the ability to ease the
development of transaction processing by providing a reasonable scheme for
mapping a business process to a grouped sequence of table operations to be
executed as a single unit of work.
Bina Nusantara University
5
Data Models (cont…)
Another essential goal of the relational model is the identification and elimination of
redundancy within a database, process, called normalization, analyzes tables to find
instances of replicated data within one table that can be extracted into a separate
table that can be linked relationally through a foreign key. (see Figure 11-1 and 11-2)
•
Dimensional Models
Dimensional modeling captures the basic unit of representation as a single
multikeyed entry in a slender fact table, with each key exploiting the relational model
to refer to the different dimensions associated with those facts. A maintained table of
facts, each of which is related to a set of dimensions, is a much more efficient
representation for data in a data warehouse.
•
Fact Tables and Star Schemes
The representation of a dimensional model is straightforward. A fact table contains
records that refer to observable objects, usually within a business context (Figure 113). The fact table is related to dimensions in a star schema. Each entry in a
dimension represents a description of the individual entities within that dimension.
Bina Nusantara University
6
Data Models (cont…)
Figure 11-1
Figure 11-2
Bina Nusantara University
7
Data Models (cont…)
• Benefits of The Dimensional
Model for BI
Figure 11-3
Using a dimensional model for
managing data in a data
warehouse has a number of
benefits.
 The framework is simple and
predicable
 No matter what the
dimensional breakdown, there
is no inherent bias lent to any
individual dimension
 Because the dimensional
model is easily extensible
Bina Nusantara University
8
The Data Warehouse
Data warehouse is the primary source of information that feeds the analytical
processing within an organization.
• A data warehouse is centralized repository of information.
• A data warehouse is arranged around the relevant subject areas important
to the corporation as a whole.
• A data warehouse is queryable source of data for enterprise.
• A data warehouse is used for analysis and not for transaction processing.
• The data in a data warehouse is nonvolatile.
• A data warehouse is the target location for integrating data from multiple
sources, both internal and external to an enterprise.
Bina Nusantara University
9
The Data Mart
A data mart is a subject-oriented data repository, similar in structure to the
enterprise data warehouse, but it holds the data necessary for the decision
support and BI needs of a specific department or group within the organization.
A data mart could be constructed solely for the analytical purposes of the
specific group or could be derived from an exiting data warehouse. Data marts
are also built using the star join structure.
Bina Nusantara University
10
Online Analytical Processing
Online analytical processing tools provide a means for presenting data sourced
from a data warehouse or data mart in a way that allows the data consumer to
view comparatives metrics across multiple dimensions.
The dimensions of data to be analyzed in an OLAP environment are arranged
in cube structure (actually, a hypercube), where summaries of any dimension
can be seen in the context of other dimensions.
Because of the cube structure, there is an ability to rotate the perception of the
data to provide different views into the data using alternate base dimensions.
The value of an OLAP tool is derived from the ability to quickly analyze the data
from multiple points of view, and so OLAP tools are designed to precalculate
the aggregations and store them directly in the OLAP databases.
Bina Nusantara University
11
Metadata
The standard definition of metadata is “data about the data”. Essentially,
metadata is an shareable master key to all the information that is feeding the
business analytics, from the extraction and population of the central repository
to the provisioning of data out of the warehouse and onto the screens of the
business clients.
• The Importance of Metadata
The management of metadata is probably one of the most critical tasks associated
with a successful BI program, for a number of reasons.
 Metadata encapsulates both the logical and physical business knowledge
 Metadata captures the structure and meaning of the data that is being fed
into the warehouse.
 The recording of operational metadata provides a road map
 One can capture differences associated with how data is manipulated
 Metadata provides the means for tracing the evolution of information
Bina Nusantara University
12
Metadata (cont…)
•
Technical Metadata
Technical Metadata characterizes the structure of data, the way that data move, and
how it is transformed as it moves from one location to another. This may incorporate
some or all of the following.
 Connectivity Metadata
 Table Information
 Record Structure Information
 Record Manipulation Metadata
 Index Metadata
 Data Practitioners
 Security and Access Metadata
 Data Model Metadata
 Physical Features Metadata
 Reference Metadata
 Management Metadata
Bina Nusantara University
13
Metadata (cont…)
 Transformation Metadata
 Process Metadata
 Supplied Data Metadata
•
Business Metadata
Business metadata incorporates much of the same information as technical
metadata, as well as:
 Metadata that describes the structure of data as perceived by business
clients.
 Descriptions of the methods for accessing data for client analytical
applications.
 Business meaning for tables and their attributes
 Data ownership characteristics and responsibilities
 Data domains and mapping between those domains, for validation
 Aggregation and summarizations directives
 Reporting directives
Bina Nusantara University
14
Metadata (cont…)
 Security and access policies
 Business rules that describes constraints or directives associated with data
within a record or between records as joined through a join condition
•
The Metadata Repository
As primary source of knowledge about the inner workings of the BI environment, it is
important to build and maintain a metadata repository that is available to all
knowledge workers involved in the BI program. Whether the metadata repository is
physically centralized or distributed across multiple systems and however its
accessed, it is important to provide a mechanism for publishing metadata.
Bina Nusantara University
15
Management Issues
The significant management issues associated with the topics in this chapter
deal with aspects of this.
• Dueling Opinions
There are basically two different schools of though about how to build a data
warehouse and a BI program, and for some reason there seems to be an almost
religious adherence by practitioners of these different schools.
•
The Technology Trap
There are many interesting technologies associated with data warehousing, but too
often technologists drive these project. It is important to keep in mind that the coolest
way to do something is not necessarily the best way to do it.
•
The Vendor Trap
Be aware that there are many vendors producing canned solutions and products
under the guise of data warehouse, data mart, metadata repositories, and OLAP
environments. There are many examples of high-cost software products that are too
complicated for the customer to use without additional investment in training and
consulting, and there ultimately end up as “shelfware.”
Bina Nusantara University
16
End of Slide
Bina Nusantara University
17