Download Data Warehousing

yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Document related concepts

Nonlinear dimensionality reduction wikipedia, lookup

Module 9 – Data Warehousing, Data Marts & Bridging the Information Gap
In this chapter we will look at the concepts of data warehousing, data marts and data mining.
As companies have created both flat files and databases over the years, there have been many
instances where data hasn’t been analyzed, created and stored in a cohesive manner. Many
companies have created databases without using the E/R and normalization process. Data from
flat files have been simply moved into a database format with no redesign. Thus, much of the
data stored in current databases does not give us the results we expect. One way to deal with this
data problem is to create data warehouses.
DATA WAREHOUSE – a data warehouse is a subject-oreinted, integrated, time-variant,
nonupdatable collection of data used in support of management decision-making.
1. Subject-oreinted organized around key subjects of the enterprise (customers, patients,
students, products, etc.)
2. Integrated – data are defined using consistent naming conventions, formats, encoding
structures and related characteristics gathered from several systems
3. Time-variant where data in the warehouse contain a time dimension so that they may also
be used to study trends and changes
4. Nonupdateable means that data loaded into the warehouse and refreshed from operational
sysems ,but it cannot be updaed by end users
The need for data warehouses include: 1) a company requires an integrated, company-wide view
of high-quality information, and 2) information systems must separate informational from
operational systems to improve performance in managing company data
1. Data are extracted from the various internal and external source system files and
databases. In a large organization, there ay be dozens or even hundreds of such files and
2. The data from the various source systems are transformed and integrated before being
loaded into the data marts. The datawarehouse is a set of data marts
3. The data warehouse is a set of physically distinct databases organized for decision
support – it contains detailed and summary data
4. Users access the data warehouse by means of a variety of query languagaes and analytical
These may be either logical of physical. Data marts may be certain data that pertain to a
particular area (customers, patients, etc.) pulled from a data warehouse. Logical data marts are
views of the main data waqrehouse.
1. Status versus Event data – status data is a before and after image; transactions are
business activities that cause one or more business events to occur at a database level – an
event results in one or more database actions (create, update or delete)
2. Transient versus Periodic data – transient data are data in which changes to existing
records are written over previous recors, thus destroying the previous data conent;
periodic data are data that are never physically altered or deleted once they have been
added to the store
Star schema is a simple database design in which dimensional data are separated from fact or
event data – also called a dimensional model.
Two types of tables are in a star schema – Fact tables contain factual or quantitative data that are
numerical, continuously valued, and addivite; dimensional tables hold descriptive data
Essential Rules of Dimensional Modeling:
1. Use atomic facts
2. Create single-process fact tables
3. Include a dae dimension for every fact table
4. Enforce consistent grain
5. Disallow null keys in fact tables
6. Honor hierarchies
7. Decode dimension tables
8. Use surrogate keys
9. Conform dimensions
10. Balance requirements with actual data
OLAP Tools (Online analytical processing)is the use of a set of query and reporting tools that
provides users with multimensional views of their data. Types of activities include Slicing a
Cube, Drilling Down, and Summarizing more than 3 dimensions.
Data Mining allows users to look for patterns and trends in data. It is referred to as knowledge
discovery. The goals of data mining are:
1. Explanatory – to explain some observed event or condition
2. Confirmatory – to confirm a hypothesis
3. Exploratory – to analyze data for new or unexpected relationships
Data mining techniques include: regression, decision tree induction, clustering and signal
processing, affinity, sequence association, case-based reasoning, rule discovery, fractals and
neural nets.