Download Chapter 1 - azharunisel

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Clusterpoint wikipedia , lookup

Big data wikipedia , lookup

Data Protection Act, 2012 wikipedia , lookup

Data model wikipedia , lookup

Data center wikipedia , lookup

Database model wikipedia , lookup

Forecasting wikipedia , lookup

Data analysis wikipedia , lookup

3D optical data storage wikipedia , lookup

Information privacy law wikipedia , lookup

Data vault modeling wikipedia , lookup

Business intelligence wikipedia , lookup

Transcript
FACULTY OF COMPUTER SCIENCE AND INFORMATION TECHNOLOGY
DATA WAREHOUSE & DATA MINING
Chapter 1. Introduction to Data Warehouse
Chapter 1
Introduction to Data Warehouse
1.1
What is Data Warehouse
Figure 1: Data warehouse
There is a lot of confusion concerning the terms data mining and data warehousing (also
referred to as business intelligence in the marketplace today). Many IT professionals use the
two terms interchangeably, with little hesitation or regard for the differences between the
two types of applications. While the goals of both are related, and often overlap; data
mining and data warehousing are dedicated to furnishing different types of analytics, for
different types of users and therefore merit their own space.
Data warehouse focus primarily on the storage of data to be used as the information
basis for tactical and strategic decision making. Such decision making typically
requires extensive data manipulation to extract information from data. In short, the data
warehouse data are the source for decision support systems. Data warehouse databases are
far less time-critical than transaction databases. Data warehouse databases typically store
complex data derived from many sources. To make it easier to store and retrieve such
complex data, the data warehouse database structure ends to be quite different from
Page 1
FACULTY OF COMPUTER SCIENCE AND INFORMATION TECHNOLOGY
DATA WAREHOUSE & DATA MINING
Chapter 1. Introduction to Data Warehouse
that of the transaction database. Some of the applications data warehousing can be used for
are:








Decision support
Trend analysis
Financial forecasting
Churn Prediction for Telecom subscribers, Credit Card users etc.
Insurance fraud analysis
Call record analysis
Logistics and Inventory management
Agriculture
Definition of Data Warehouse
Two of the pioneers in the field were Bill Inmon and Ralph Kimball
i)
Bill Inmon, the acknowledged father of the data warehouse, defines it as an
integrated, subject oriented, time-variant, non-volatile database that
provides support for decision making.
Bill Inmon is universally recognized as the "father of the data warehouse." He has over 26 years
of database technology management experience and data warehouse design expertise, and has
published 36 books and more than 350 articles in major computer journals. His books have been
translated into nine languages. He is known globally for his seminars on developing data
warehouses and has been a keynote speaker for every major computing association. Before
founding Pine Cone Systems, Bill was a co-founder of Prism Solutions, Inc.
ii)
A data warehouse is a copy of transaction data specifically structured for
querying, analysis and reporting – Ralph Kimball
Ralph Kimball was co-inventor of the Xerox Star workstation, the first commercial product to use
mice, icons, and windows. He was vice president of applications at Metaphor Computer Systems,
and founder and CEO of Red Brick Systems. He has a Ph.D. from Stanford in electrical
engineering, specializing in man-machine systems. Ralph is a leading proponent of the
dimensional approach to designing large data warehouses. He currently teaches data
warehousing design skills to IT groups, and helps selected clients with specific data warehouse
designs. Ralph is a columnist for Intelligent Enterprise magazine and has a relationship with
Sagent Technology, Inc., a data warehouse tool vendor. His book "The Data Warehouse Toolkit"
is widely recognized as the seminal work on the subject.
Components of data warehouse
A data warehouse has three main components:
(Source : Data warehouse design, http://www.essentialstrategies.com/services/warehouse.htm)
- "central data warehouse" - is a database organized according to the
corporate data model.
Page 2
FACULTY OF COMPUTER SCIENCE AND INFORMATION TECHNOLOGY
DATA WAREHOUSE & DATA MINING
Chapter 1. Introduction to Data Warehouse
- "data marts" — extracts from the central data warehouse that are organized
according to the particular retrieval requirements of individual users.
- "legacy systems" - where an enterprise's data are currently kept consist of
ODS of OLTP
Legacy Systems
Finance
Data Mart
Operational
Data Store
Sales
Data Mart
Marketing
Data Mart
Accounting
Data Mart
Operational
Data Store
Operational
Data Store
Organizational
Data
Warehouse
Operational
Data Store
Figure 2: Data warehouse components.
1.2
The needs for Data Warehouse
Today’s business needs data warehouse because:
i)
Decisions need to be made quickly and correctly, using all available data.
ii) Users are business domain experts, not computer professionals.
iii) The amount of data doubles every 11 months (source SAS forum 2008), which
affects response time and the sheer ability to comprehend its content.
iv) Competition is heating up in the areas of business intelligence and added
information value.
v) Unlocking data resides.
-
Data warehouses can: provide immediate information delivery, data integration
from across the organization, provide a clear view of historical trends, provide
new tools for looking at data, and free users from IS department capacity
constraints.
-
The value from data warehouses comes from the reduced cost of system
maintenance (assuming that the data warehouse is replacing another system(s))
and in the added revenue from the new business processes resulting from the
additional information supplied by the warehouse.
Page 3
FACULTY OF COMPUTER SCIENCE AND INFORMATION TECHNOLOGY
DATA WAREHOUSE & DATA MINING
Chapter 1. Introduction to Data Warehouse
SAMPLE OF DATA WAREHOUSE SOLUTION
Figure 3: How data warehouse solve problems. (Source : Database Systems: Design, Implementation, and Management, Rob and Coronel, 2007)
Page 4
FACULTY OF COMPUTER SCIENCE AND INFORMATION TECHNOLOGY
DATA WAREHOUSE & DATA MINING
Chapter 1. Introduction to Data Warehouse
1.3
Characteristic of Data Warehouse
Subject Oriented:
Data that gives information about a particular subject instead of about a company's ongoing
operations.
Figure 4: Application Oriention vs Subject Orientation. (Source : Modern Data Warehousing, Mining
and Visualization, Marakas, 2002)
Subject is equivalent to entity such as employees, customers, products, vendors etc.
Application orientation consists of data group/keep by database connected to company’s
system@applications. In application orientation, data of products is scattered (in finance,
inventory, asset tracking, sales and marketing also in materials planning). While in subject
orientation data is centralize under one database/storage.
A single-subject data warehouse is typically referred to as a data mart, while data
warehouses are generally enterprise in scope.
Integrated:
Data that is gathered into the data warehouse from a variety of sources and merged into a
coherent whole. This means there should be consistency among naming conventions,
measurements of variables, encoding structures, physical attributes, and other salient data
characteristics. An example of this integration is the treatment of codes such as gender
codes. Within a single corporation, various applications may represent gender codes in
different ways: male vs. female, m vs. f, and 1 vs. 0, etc. In the Data Warehouse, gender is
always represented in a consistent way, regardless of the many ways by which it may be
encoded and stored in the source data. As the data is moved to the Warehouse, it is
transformed into a consistent representation as required.
Page 5
FACULTY OF COMPUTER SCIENCE AND INFORMATION TECHNOLOGY
DATA WAREHOUSE & DATA MINING
Chapter 1. Introduction to Data Warehouse
Figure 5: Cleansing process assure integrated data in data warehouse (Source : Modern Data
Warehousing, Mining and Visualization, Marakas, 2002)
Time-variant:
All data in the data warehouse is identified with a particular time period. All data in Data
Warehouse is accurate as of some moment in time, providing an historical perspective. This
differs from the operational environment in which data is intended to be accurate as of the
moment of access. The data in the Data Warehouse is, in effect, a series of snapshots. Once
the data is loaded into the enterprise data store and data marts, it cannot be updated. It is
refreshed on a periodic basis, as determined by the business need. The operational data
store, if included in the Warehouse architecture, may be updated.
Non-volatile
Data is stable in a data warehouse. More data is added but data is never removed. This
enables management to gain a consistent picture of the business. Data in the Warehouse is
static, not dynamic. The only operations that occur in Data Warehouse applications are the
initial loading of data, access of data, and refresh of data. For these reasons, the physical
design of a Data Warehouse optimizes the access of data, rather than focusing on the
requirements of data update and delete processing.
(Source: "What is a Data Warehouse?" W.H. Inmon, Prism, Volume 1, Number 1, 1995).
Page 6
FACULTY OF COMPUTER SCIENCE AND INFORMATION TECHNOLOGY
DATA WAREHOUSE & DATA MINING
Chapter 1. Introduction to Data Warehouse
Other information about data warehouse
i)
Data warehouse is an environment, not a product.
ii)
The data in a data warehouse is static, not dynamic.
iii) The data warehouse is setup to be the central repository of all organizational
data deemed useful for reporting or exploration
iv) An organization that spends more on their data warehouse will not, necessarily,
be happier with their data ware house. It is dependent upon the design.
v)
Data warehouses will have users who have very set requirements for the system
(farmers) and users who have various needs that they cannot define for the
system (explorers).
vi) It is a database designed for analytical tasks, using data from multiple
applications – data sources.
vii) It supports a relative small number of users with relatively long interactions.
Figure 6: Data flow in Data warehouse
viii) Its usage is read-intensive – summarized.
ix) Its content is periodically uploaded (only additions, no deletion and editing)
x)
Its contains current and historical data to provide a historical perspective of
information
xi) It contains a few large tables – large volume.
xii) Each query frequently results in a large result set and involves frequent full table
scan and multiple joins.
Page 7
FACULTY OF COMPUTER SCIENCE AND INFORMATION TECHNOLOGY
DATA WAREHOUSE & DATA MINING
Chapter 1. Introduction to Data Warehouse
xiii) Among Data warehouse and data mining software
 Alacrity, Inc. : Integrated Data Intelligence Software
 AlphaBlox : Data Analysis Software
 AltaPlan : OLAP Links
 Aonix : Object Oriented Modeling Tool and Cleansing Software
 Attar Software : Data Mining / Neural Nets
 Brio Technologies : Brio Web Warehouse and Decision Support Suite
 Bull : Data Warehousing Solutions
 Business Objects, Incorporated : WebIntelligence for enterprise decision support
 Cognos, Incorporated : Data Warehousing Software Tool Suite
 DataFlux : Data Quality and Integration Software
 DataMirror : Data Integration, Data Protection, Data Audit Solutions
 Decision Point Applications : Packaged Data Warehouse Solutions
 Decision Technology : DecisionCentric® Server
 Dimensional Insight, Inc. : Reporting and analysis software
 Evolutionary Technologies, Inc. : ETI*EXTRACT(r) Tool Suite for Data Warehousing and
Data Migration
 FileTek : Software for managing massive amounts of atomic data
 First Logic : Customer Data Management Software
 Hyperion : ESSBASE - High Speed OLAP Processor
 IBM : DataGuide
 Information Builders : Data management software Data Warehousing, Decision Support,
Middleware, Data Access, ...
 Intelligent Solutions, Inc. : Claudia Imhof / Data Warehousing and Data Modeling
 IRI : CoSORT ETL Software
 Kalido : Software for adaptive enterprise data warehousing and master data management
 Kenan Systems Corporation : Market Analysis Software
 Megaputer Intelligence : Data Mining and Warehousing
 Micro Strategy : Relational OLAP (ROLAP) Software and Services
 Nautilus Systems, Inc. : Data Warehousing, Data Mining, and Data Visualization software
 netcarve Technologies GmbH : Data Warehousing and Data Mining Solutions
 NetScheme Solutions Inc.
 Pilot Software : Customer and Market Data Analysis Software
 Poinpoint Solutions Inc. : Data Warehousing solutions for the insurance industry
 Princeton Softech : Database Management and Data Warehouse Software
 Query Object Systems Corp : Business Solution Components
 Ralph Kimball Associates : A pioneer and leader in the Data Warehouse field
 Redbrick Systems : Multidimensional database software
 Retek : Data Warehousing for Retail Industry
 Rocket Software : Business Intelligence
 Rulequest Research : Data Mining Tools
 Salford Systems : CART software for tree-structure, non-parametric data analysis
 SAS Institute : Data Warehouse and Data Mining Software
 Seagate Software : Crystal Reports
 Silvon Software : Supply Chain Data Warehousing
 SolutionsIQ : Data Warehousing Solutions
 Speedware Corp :Business Intelligence Software
 Sybase, Inc. : Data Warehousing Database Software
 Teleran : Data Warehousing and eCommerce Solutions
 Teradata / NCR : Database machine
 Thinking Machines Corporation : Data mining software for loyalty management systems
 Trillium Software : Data Cleansing and Data Reengineering
 Universal Data Solutions, LLC : Len Silverston - Data Modeling and Data Warehouse
Page 8
FACULTY OF COMPUTER SCIENCE AND INFORMATION TECHNOLOGY
DATA WAREHOUSE & DATA MINING
Chapter 1. Introduction to Data Warehouse
1.4
Data Mining
Figure 7: Data mining in Data warehouse environment
i)
The development of data mining functions and products is the result of years
from many disciplines including databases, information retrieval, statistics,
algorithms, machine learning and multimedia.
ii) Data warehouse contains useful source of data for explorer (data miner).
iii) Data mining can take place without a data warehouse and same goes to data
warehouse, but a data warehouse offers data miners many advantages.
A
combination of data warehouse & data mining offers optimum analysis for
business.
iv) Data mining is one of data warehouse end user tools besides reporting, query,
application development, Executive Information System (EIS) and Online
Analytical Processing (OLAP) tools.
v) There is no one data mining approach but rather a set of techniques that often
can be used in combination with each other to extract the most insight from a set
of data.
vi) Literature of data mining also uses Knowledge Discovery in Databases (KDD),
information discovery, information harvesting, data archeology, data exploration
and data pattern processing to represent the knowledge discovery.
Page 9
FACULTY OF COMPUTER SCIENCE AND INFORMATION TECHNOLOGY
DATA WAREHOUSE & DATA MINING
Chapter 1. Introduction to Data Warehouse
Figure 8: Data mining application screen shot
Extra notes:
A transaction (production) databases reflects the daily operations of an organization. As the
name implies, a transaction database records transactions such as the sale of a product, the
enrolment of a student, or the opening of a checking account. Such transactions are time-critical
and must be recorded accurately and immediately. Given their critical role in the day-to-day
operations – and given the fact that transaction database data are usually the data source
for data warehouse databases – much of the focus on the design of transaction databases.
Data warehouse databases focus primarily on the storage of data to be used as the
information basis for tactical and strategic decision making. Such decision making typically
requires extensive data manipulation to extract information from data. In short, the data
warehouse data are the source for decision support systems (DSS) or executive information
system (EIS). Data warehouse databases are far less time-critical than transaction databases;
the former record the historical data record that will be used to formulate pricing decisions, sales
forecasts, market positioning, and so on. Data warehouse databases typically store complex
data derived from many sources. To make it easier to store and retrieve such complex data,
the data warehouse database structure ends to be quite different from that of the transaction
database.
Page 10
FACULTY OF COMPUTER SCIENCE AND INFORMATION TECHNOLOGY
DATA WAREHOUSE & DATA MINING
Chapter 1. Introduction to Data Warehouse
Table 1. Transaction Processing vs Data Warehouse
Characteristics
Transaction Processing
Data Warehouse
Volatility
Dynamic
Static
Currentness
Current
Historical
Time dimension
Implicity “now”
Explicit, visible
Granularity
Primitive, detailed
Detailed and derived summaries
Updates
Continues, random
Periodic, scheduled
Tasks
Repetitive
Unpredictable
Flexibility
Low
High
Performance
High performance mandatory
Lower performance often
acceptable
Source : Decision support and data warehouse systems by Efrem G Mallach – Mc Graw Hill, 2000
An operational data store (ODS) stores data for a specific application.
It feeds the data
warehouse a stream of desired raw data.
A data mart is a lower-cost, scaled-down version of a data warehouse, usually designed to
support a small group of users (rather than the entire firm). A single-subject data warehouse is
typically referred to as a data mart, while data warehouses are generally enterprise in scope. Also,
data warehouses can be volatile. Due to the large amount of storage required for a data
warehouse, (multi-terabyte data warehouses are not uncommon), only a certain number of
periods of history are kept in the warehouse. For instance, if three years of data are decided on
and loaded into the warehouse, every month the oldest month will be "rolled off" the database,
and the newest month added.
Data warehouse vs datamart
Data Marts
Organizational Data
Warehouse
Corporate
Highly granular data
Normalized design
Robust historical data
Large data volume
Data Model driven data
Versatile
General purpose DBMS
technologies
Departmentalized
Summarized, aggregated
data
Star join design
Limited historical data
Limited data volume
Requirements driven data
Focused on departmental
needs
Multi-dimensional DBMS
technologies
DW users
Every corporation has two types of DW users – Farmers and Explorers
Page 11
FACULTY OF COMPUTER SCIENCE AND INFORMATION TECHNOLOGY
DATA WAREHOUSE & DATA MINING
Chapter 1. Introduction to Data Warehouse
Farmers know what they want before they set out to find it. They submit small queries and
retrieve small nuggets of information.
Explorers are quite unpredictable.
They often submit large queries.
Sometimes they find
nothing, sometimes they find priceless nuggets.
Cost justification for the DW is usually done on the basis of the results obtained by farmers
since explorers are unpredictable.
Source System Identification : In order to build the data warehouse, the appropriate data must be
located. Typically, this will involve both the current OLTP (On-Line Transaction Processing) system
where the "day-to-day" information about the business resides, and historical data for prior periods,
which may be contained in some form of "legacy" system. Often these legacy systems are not
relational databases, so much effort is required to extract the appropriate data.
Data Warehouse Design and Creation: This describes the process of designing the warehouse,
with care taken to ensure that the design supports the types of queries the warehouse will be used for.
This is an involved effort that requires both an understanding of the database schema to be created,
and a great deal of interaction with the user community. The design is often an iterative process and it
must be modified a number of times before the model can be stabilized. Great care must be taken at
this stage, because once the model is populated with large amounts of data, some of which may be
very difficult to recreate, the model can not easily be changed.
Data Acquisition: This is the process of moving company data from the source systems into the
warehouse. It is often the most time-consuming and costly effort in the data warehousing project, and
is performed with software products known as ETL (Extract/Transform/Load) tools. There are currently
over 50 ETL tools on the market. The data acquisition phase can cost millions of dollars and take
months or even years to complete. Data acquisition is then an ongoing, scheduled process, which is
executed to keep the warehouse current to a pre-determined period in time, (i.e. the warehouse is
refreshed monthly).
Changed Data Capture: The periodic update of the warehouse from the transactional system(s) is
complicated by the difficulty of identifying which records in the source have changed since the last
update. This effort is referred to as "changed data capture". Changed data capture is a field of
endeavor in itself, and many products are on the market to address it. Some of the technologies that
are used in this area are Replication servers, Publish/Subscribe, Triggers and Stored Procedures, and
Database Log Analysis.
Data Cleansing: This is typically performed in conjunction with data acquisition (it can be part of the
"T" in "ETL"). A data warehouse that contains incorrect data is not only useless, but also very
dangerous. The whole idea behind a data warehouse is to enable decision-making. If a high level
decision is made based on incorrect data in the warehouse, the company could suffer severe
consequences, or even complete failure. Data cleansing is a complicated process that validates and, if
necessary, corrects the data before it is inserted into the warehouse. For example, the company could
have three "Customer Name" entries in its various source systems, one entered as "IBM", one as
"I.B.M.", and one as "International Business Machines". Obviously, these are all the same customer.
Someone in the organization must make a decision as to which is correct, and then the data cleansing
tool will change the others to match the rule. This process is also referred to as "data scrubbing" or
"data quality assurance". It can be an extremely complex process, especially if some of the warehouse
inputs are from older mainframe file systems (commonly referred to as "flat files" or "sequential files").
Data Aggregation: is process is often performed during the "T" phase of ETL, if it is performed at all.
Data warehouses can be designed to store data at the detail level (each individual transaction), at
some aggregate level (summary data), or a combination of both. The advantage of summarized data
is that typical queries against the warehouse run faster. The disadvantage is that information, which
may be needed to answer a query, is lost during aggregation. The tradeoff must be carefully weighed,
because the decision can not be undone without rebuilding and repopulating the warehouse. The
Page 12
FACULTY OF COMPUTER SCIENCE AND INFORMATION TECHNOLOGY
DATA WAREHOUSE & DATA MINING
Chapter 1. Introduction to Data Warehouse
safest decision is to build the warehouse with a high level of detail, but the cost in storage can be
extreme.
Now that the warehouse has been built and populated, it becomes possible to extract meaningful
information from it that will provide a competitive advantage and a return on investment. This is done
with tools that fall within the general rubric of "Business Intelligence".
Business Intelligence (BI): A very broad field indeed, it contains technologies such as Decision
Support Systems (DSS), Executive Information Systems (EIS), On-Line Analytical Processing (OLAP),
Relational OLAP (ROLAP), Multi-Dimensional OLAP (MOLAP), Hybrid OLAP (HOLAP, a combination of
MOLAP and ROLAP), and more. BI can be broken down into four broad fields:
Multi-dimensional Analysis Tools: Tools that allow the user to look at the data from a number of
different "angles". These tools often use a multi-dimensional database referred to as a "cube".
Query tools: Tools that allow the user to issue SQL (Structured Query Language) queries against the
warehouse and get a result set back.
Data Mining Tools: Tools that automatically search for patterns in data. These tools are usually
driven by complex statistical formulas. The easiest way to distinguish data mining from the various
forms of OLAP is that OLAP can only answer questions you know to ask, data mining answers
questions you didn't necessarily know to ask.
Data Visualization Tools: Tools that show graphical representations of data, including complex
three-dimensional data pictures. The theory is that the user can "see" trends more effectively in this
manner than when looking at complex statistical graphs. Some vendors are making progress in this
area using the Virtual Reality Modeling Language (VRML).
Metadata Management: Throughout the entire process of identifying, acquiring, and querying the
data, metadata management takes place. Metadata is defined as "data about data". An example is a
column in a table. The datatype (for instance a string or integer) of the column is one piece of
metadata. The name of the column is another. The actual value in the column for a particular row is
not metadata - it is data. Metadata is stored in a Metadata Repository and provides extremely useful
information to all of the tools mentioned previously. Metadata management has developed into an
exacting science that can provide huge returns to an organization. It can assist companies in
analyzing the impact of changes to database tables, tracking owners of individual data elements
("data stewards"), and much more. It is also required to build the warehouse, since the ETL tool needs
to know the metadata attributes of the sources and targets in order to "map" the data properly. The BI
tools need the metadata for similar reasons.
Summary:
Data Warehousing is a complex field, with many vendors vying for market awareness. The complexity
of the technology and the interactions between the various tools, and the high price points for the
products require companies to perform careful technology evaluation before embarking on a
warehousing project. However, the potential for enormous returns on investment and competitive
advantage make data warehousing difficult to ignore.
Page 13
FACULTY OF COMPUTER SCIENCE AND INFORMATION TECHNOLOGY
DATA WAREHOUSE & DATA MINING
Chapter 1. Introduction to Data Warehouse
Exercise:
A. Multiple choice questions.
1. Which of the following statements is NOT TRUE about datamarts?
2.
A.
It is a smaller, more content version of data warehouse.
B.
A data mart is a subset of a data warehouse.
C.
It is a representation of an object.
D.
It holds integrated data of an organization.
The followings are the characteristics of the operational data store, EXCEPT:
A.
Data are from the day-to-day business operations.
B.
It holds historical data.
C.
Data are highly and immediately available.
D.
None of the above
3. Which of the following activities would not normally be associated with a data warehouse?
A.
Loading
B.
Updating
C.
Accessing
D.
None of the above.
4. Which of the following is not a characteristic of a data warehouse?
A.
Data integrated
B.
Volatile
C.
Subject oriented
D.
Time variant
5. Which of the following is a synonym for data mining?
A.
Knowledge data discovery
B.
Information analysis
C.
Coal mining
D.
None of the above.
Page 14
FACULTY OF COMPUTER SCIENCE AND INFORMATION TECHNOLOGY
DATA WAREHOUSE & DATA MINING
Chapter 1. Introduction to Data Warehouse
B. True or False questions.
1. Once data enter the data warehouse, it is never removed.
2. According to Inmon, the data warehouse and operational environments are separated.
3. Data marts can serve as a test vehicles for companies exploring the potential benefits of data
warehouses.
C.
Short questions.
Question 1
a)
Main characteristics of data warehouse are Subject Oriented, Nonvolatile, Time
variant and Integrated. Explain each of them. (4 marks)
Question 2
a)
What is Data Warehousing? (3 marks)
b)
List FOUR (4) benefits of having data warehouse in an organization? (4 marks)
c)
List and explain any FOUR (4) characteristic of data warehouse. (8 marks)
Question 3
Discuss the differences between transaction databases and data warehouse databases.
Page 15