Download CO7107 Data warehouse

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Oracle Database wikipedia , lookup

Extensible Storage Engine wikipedia , lookup

Entity–attribute–value model wikipedia , lookup

Microsoft Jet Database Engine wikipedia , lookup

Concurrency control wikipedia , lookup

Open Database Connectivity wikipedia , lookup

Clusterpoint wikipedia , lookup

Database wikipedia , lookup

Relational model wikipedia , lookup

Database model wikipedia , lookup

Transcript
CO7107 DATA
WAREHOUSING AND MULTIDIMENSIONAL DATABASES
Assessment
Andrew Jones
0817804 J01209
Table of Contents
Table of Figures ....................................................................................................................................... 2
Introduction............................................................................................................................................ 3
Section 1.................................................................................................................................................. 3
How does it work ............................................................................................................................ 4
Section 2.................................................................................................................................................. 7
Section 3.................................................................................................................................................. 9
Outline evolution of databases and discuss how they have tried to address problems .................... 9
Conclusion ............................................................................................................................................. 12
Table of Figures
Figure 1 DBMS as an interface ................................................................................................................ 4
Figure 2 Physical and Logical Databases (Microsoft Corporation, 2015)................................................ 4
Figure 3 Composition of a DBMS ............................................................................................................ 5
Figure 4 Gartner Magic Quadrant of DBMS providers............................................................................ 5
Figure 5 five DBMS models ..................................................................................................................... 6
Figure 6 DBMS Rankings by popularity (DB-engines, 2015) ................................................................... 6
Figure 7 DBMS Model by popularity (DB-engines, 2015) ....................................................................... 7
Figure 8 Commercial Industrial Sectors .................................................................................................. 8
Figure 9 consistency in reporting ............................................................................................................ 9
Figure 10 the mathematics of ER Model .............................................................................................. 10
“What is our business? And what business are we in?” Peter Drucker.
Introduction
In this report, the nature of a DBMS (Database Management System) will be considered. In addition,
how it works will be investigated and an exploration of the market will be undertaken to ascertain
what is considered to be leading in the field. An organisation’s needs for data, how this may have
changed over the years and how DBMS has evolved to cope with these changes will be considered.
PICK OUT KEY POINTS RE. THESE FOR CONCLUSION
Section 1
A database, (db), is “A structured set of data held in computer storage and typically accessed or
manipulated by means of specialized software.” (database, n., 2015)
This specialized software introduces the term database management system (DBMS) which is used
to describe the application suite which administers and manages the database. Within the database,
software tools are principally employed for the: storing, indexing, modification, retrieval,
transitioning, and querying of data within the system.
Gartner defines a DBMS as a complete software system used to define, create, manage, update and
query a database, by which we mean an organized collection of data that may be structured in
multiple formats and stored in some form of storage medium.
The DBMS market is defined by relational and nonrelational database management products that
are suitable for a broad range of enterprise-level transactional applications. (Computer Weekly,
2015)
How does it work
The DBMS is an interface between the database and the entities that need and require access to the
database.
Users
DBMS
Database
Applications
Services
Figure 1 DBMS as an interface
Advantages of the DBMS
 Data sharing
 Data security
 Data integration
 Referential integrity
 Minimized data inconsistency
 Improved data access
 Improved decision making
 Increased end-user productivity
Disadvantages of a DBMS
 Increased costs
 Management complexity
 Maintaining currency
 Frequent upgrade/replacement cycles
Table 1 Characteristics of a DBMS the advantages and disadvantages (My Reading Room, 2012)
A DBMS is composed of three principal elements, these are:
1. Physical database management
Transparent to the user is the storage and retrieval of the logical database over multiple physical
files in the storage medium.
Figure 2 Physical and Logical Databases (Microsoft Corporation, 2015)
2. Database engine
The Database Engine is the principal service for storing, processing, and securing data; it is this
package element that is responsible for the: create, read, update and delete (CRUD) functionality of
the DBMS.
3. Database schema
The db schema is a logical data storage structure. The
schema is composed of a number of schema objections,
known as a collection. Some examples of schema
objects are:







tables
views
indexes
database links
procedures
functions
packages
Physical
database
Database
engine
Database
schema
The DBMS manages a single logical presentation of the Figure 3 Composition of a DBMS
data safeguarding concurrency, data integrity, security
and backup.
Of the DBMS on the market, those considered to be leading products are Oracle, Microsoft, IBM and
SAP. These organisations are so market dominant that there is little left for rival companies other
than those involved in very specific market niches, such as Neo Technology with their graph product.
Hybrid capabilities extend the functionality of the RDBMS to more use cases, making a stronger case
for their ubiquity. (Mullins, Key criteria for deciding if a relational DBMS meets your IT needs, 2015)
“Relational databases, or RDBMSes, became the norm in IT more than 30 years ago as low-cost
servers became powerful enough to make them widely practical and relatively affordable.” (Mullins,
Evaluating the different types of DBMS products , 2015)
Figure 4 Gartner Magic Quadrant of DBMS providers
There are broadly five DBMS models to consider
excluding RDBMS. These include key-value DBMS
models which utilize “a key-value pair (KVP)
which is a set of two linked data items: a key,
which is a unique identifier for some item of
data, and the value, which is either the data that
is identified or a pointer to the location of that
data”. (Rouse, 2008). Another model is the
document model, in which every record in
viewed as a document. (MongoDB, 2015). The
column model stores data in columns instead of
Key-value
Document
Column store
Graph
In memory
rows. (Rouse, 2010). The graph model applies the
mathematical practice of graph theorem. This
model significantly differs from other models in that it allows for ACID compliance, in common with
RDBMS. A: atomicity. C: consistency. I: isolation. D: durability. This model is a cornerstone of transaction
processing. Compliance with the model ensures that transactions with the database are processed
dependably, in that they are successfully committed or completely rolled back.
Figure 5 five DBMS models
DBMS ranked by popularity
Redis
SQLite
Cassandra
Microsoft Access
DB2
PostgreSQL
MongoDB
Microsoft SQL Server
MySQL
Oracle
0.00%
5.00%
10.00% 15.00% 20.00% 25.00% 30.00% 35.00%
Figure 6 DBMS Rankings by popularity (DB-engines, 2015)
1.91%
DBMS Model by popularity
2.11%
5.41%
90.57%
Relational DBMS
Document store
Wide column store
Key-value store
Figure 7 DBMS Model by popularity (DB-engines, 2015)
The other models employ a BASE methodology. Basically Available, Soft state, Eventual consistency.
This BASE model does not provide the guarantees in data transactions that ACID offers, but reflects
the complexities of the data managed by these DBMS.
The final form of DBMS is an emergent form called the in memory model. These databases’ data is
“stored in main memory to facilitate faster response times”. (Rouse, in-memory database, 2012)
Section 2
Business run on data: the amount of a sale, the cost of a purchase, and the wages of staff: but
businesses survive on cash flow and strategic planning; typically one of the first strategic Information
Technology and Information Systems decision an organisation makes is its accounting system. Its
first serious venture into Business Intelligence (BI). The business can now start to gather, report and
plan consistently on the organisations Key Performance Indicators (KPI).
Businesses do not start big. They start small and grow.
Figure 8 Commercial Industrial Sectors
For a small business information needs are initially awfully basic and can be addressed
predominantly utilising lists, but as the business grows, and the models for management start to
formalise, there becomes a requirement for the enhanced management of data and information
sources. During this business growth, the business starts to develop formal reporting structures for
its operational data, tactical information, together with its strategic management information.
Sources of this data and information include:



accounting information
o sales order processing
o purchase order processing
o profit and loss accounts
o balance sheet
Inventory management
Production data
Over time the organisation will have generated empirical data to work with, consequently it can
begin to forecast, complete empirical comparisons and plan for the future needs. If the business
itself continues to grow, it may require other functions to perform its operational and strategic
activities. Such functions are:




Payroll
Marketing
HRM
CAD/CAM
Also there will be an extension of IT into other areas of business. Unless planned for, local and
workgroup developed databases and technology infrastructure will prove difficult and costly to
integrate into an overall strategy. There may also be data model alignment issues as a result of
mergers or acquisitions.
As the business grows into the large phase, data security issues will rise to prominence. The changes
in data value are now very real and the threats to them incredibly serious. Breaches in security are
costly to the organisation.
In this large phase of business development the physicality of data requirements will also change.
There will be a lot more computers, tablets, phones, scanners, copiers, printers and cameras to
manage. In addition there will be the large scale server, mainframe and networking installations
required to service the demands, needs and wants of the business. Fundamentally it comprises a lot
of infrastructure.
Business Intelligence (BI), is an analytical data analysis process aimed at enhancing business
performance by facilitating key decision makers with the ability to collect, store, retrieve and analyze
data to make better-informed decisions. It provides for a consistency, which may otherwise be
lacking, in reporting and analysis. As the maxim goes: knowledge is power: but the transitioning from
data, through information to result in knowledge is time consuming.
With this in mind there is and should only be “one version of the truth”, a single set of data that
provides the source for all analysis, planning and decision making.
Data warehousing and BI have become almost interchangeable terms, as definitions vary for what is
a very broad term. It is not just a decision support system, nor is it a methodology, it does not
require a data warehouse, and it is not the tool set.
data entry
systems
data
warehouse
enterprise
reporting
system
Figure 9 consistency in reporting
Data quality is important in the creation of the data warehouse, as the adage goes: garbage in
garbage out.
BI is/can be about monitoring change. As a business organization it is beneficial to be able to react to
change. In addition the business may want to cause change. Small businesses can survive and
prosper on the instincts and intuition of the proprietor; decisions are not necessarily made from
factual evidence. As the business grows, layers of management are introduced and “buy in” from
stakeholders is required. The business formalizes and professionalizes.
Section 3
Outline evolution of databases and discuss how they have tried to address problems
Hierarchical
Database
Network
Database
Columnar
Object
Orientated
Database
Key value
Relational
Database
Named
graph
noSQL
Database
Document
model
Change happens at a rate; the rate of
this change, n, will affect an
organisations’ ability to deal with the
results, positive or negative of the
change. The faster the rate of n the
more agile the organisation needs to
be in order to facilitate a successful
outcome to the change. The less
agile the organisation the more the
change will meet resistance. This
resistance to change can become
inherent as the management
structure of the organisation grows.
Entity–
relationship
model
Flat file
In databases a flat file refers to data
files that contain records with no structured relationships. (Webopedia, 2015)
In SQL for Dummies, an introduction to Structured Query Language, Allen G. Taylor notes that the
advantage of a flat file is that it takes up less space than a structured file. However, it does require
the application to have knowledge of how the data is organized within the file.
Almost pervasively thought of as:
A flat file database is a database that stores data in a plain text file. Each line of the text file holds
one record, with fields separated by delimiters, such as commas or tabs. (TechTerms.com, 2015)
Hierarchical database model
A hierarchical database consists of a collection of records that are connected to each other through
links. Each record is a collection of fields (attributes), each of which contains only one data value. A
link is an association between precisely two records.
Two influential database systems that rely on the hierarchical model are IBM’s Information
Management System (IMS) [IBM 1978a, McGee 1977] and MRI’s System 2000 [MRI 1974, 1979]. The
first IMS version was developed in the late 1960s by IBM and by North American Aviation (Rockwell
International) for the Apollo moon-landing program. (Silberschatz, Korth & Sudarshan, 2010)
Network model
A network database consists of a collection of records connected to one another through links. Each
record is a collection of fields (attributes), each of which contains only one data value. A link is an
association between precisely two records. (Silberschatz, Korth & Sudarshan, 2010)
Relational model
The relational model is cantered on this idea: the organization of data into collections of twodimensional tables called “relations.” (Ullman & Aho, 1994)
Entity–relationship model (ER)
ER Model is based on Strong Mathematical Foundations. The ER
model is based on:
1. Set Theory,
2. Mathematical Relations,
3. Modern Algebra,
Set Theory
Lattice Theory
Logic
Mathematical
Relations
Modern
Algebra
Figure 10 the mathematics of ER Model
4. Logic, and
5. Lattice Theory.
In the relational model the mathematical relation construct is used to express the “structure of data
values,” while in the ER model the same construct is used to express the “structure of entities.”
ER Model has Explicit Linkage between Entities. The linkage between entities is explicit in the ER
model while in the relational model is implicit. In addition, the cardinality information is explicit in
the ER model, and some of the cardinality information is not captured in the relational model. (Chen,
2015)
Object model
An object-oriented database management system (OODBMS), sometimes shortened to ODBMS for
object database management system), is a database management system (DBMS) that supports the
modelling and creation of data as objects. This includes some kind of support for classes of objects
and the inheritance of class properties and methods by subclasses and their objects. There is
currently no widely agreed-upon standard for what constitutes an OODBMS. (Rouse & Kumar,
object-oriented database management system (OODBMS or ODBMS), 2005)
Document model
Also known as JSON store or database
Document Stores provide more functionality: the system recognizes the structure of the objects
stored. Objects (or documents) may have a variable number of named attributes of various types
(integers, strings, and possibly nested objects), objects can grouped into collections, and the system
provides a simple query mechanism to search collections for objects with particular attribute values.
(Cattell, 2010)
Graph model
Formally, a graph is just a collection of vertices and edges—or, in less intimidating language, a set of
nodes and the relationships that connect them. Graphs represent entities as nodes and the ways in
which those entities relate to the world as relationships. This general-purpose, expressive structure
allows us to model all kinds of scenarios. (Robinson, Webber & Eifrem, 2013)
Key value
Key-value stores are the simplest NoSQL databases. Every single item in the database is stored as an
attribute name (or "key"), together with its value. (MongoDB, 2015)
Key-value Stores provide a distributed index for object storage, where the objects are typically not
interpreted by the system: they are stored and handed back to the application as BLOBs. However,
these systems usually provide object replication for recovery, partitioning of the data over many
machines, and rudimentary object persistence. (Cattell, 2010)
Columnar
Are optimized for queries over large datasets, and store columns of data together, instead of rows.
(MongoDB, 2015)
Don’t forget to comment on how data requirements haven’t changed significantly however as the
organisation grows there are additional functionalities that generate data that need to be included
in the information system TRY TO FIND REF.
Conclusion
On the horizon are a number of other db related ideas: CloudDB, DBaaP, DBaaS, DBaaA
In conclusion, the market for DBMS is dominated by four major organisations: Oracle, Microsoft,
IBM and SAP. All of these organisations have successfully read the market requirements for data
management systems and users of other systems have been profiled and collectively marginalised
into exceptionally narrow market niches.