* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download CO7107 Data warehouse
Oracle Database wikipedia , lookup
Extensible Storage Engine wikipedia , lookup
Entity–attribute–value model wikipedia , lookup
Microsoft Jet Database Engine wikipedia , lookup
Concurrency control wikipedia , lookup
Open Database Connectivity wikipedia , lookup
Clusterpoint wikipedia , lookup
CO7107 DATA WAREHOUSING AND MULTIDIMENSIONAL DATABASES Assessment Andrew Jones 0817804 J01209 Table of Contents Table of Figures ....................................................................................................................................... 2 Introduction............................................................................................................................................ 3 Section 1.................................................................................................................................................. 3 How does it work ............................................................................................................................ 4 Section 2.................................................................................................................................................. 7 Section 3.................................................................................................................................................. 9 Outline evolution of databases and discuss how they have tried to address problems .................... 9 Conclusion ............................................................................................................................................. 12 Table of Figures Figure 1 DBMS as an interface ................................................................................................................ 4 Figure 2 Physical and Logical Databases (Microsoft Corporation, 2015)................................................ 4 Figure 3 Composition of a DBMS ............................................................................................................ 5 Figure 4 Gartner Magic Quadrant of DBMS providers............................................................................ 5 Figure 5 five DBMS models ..................................................................................................................... 6 Figure 6 DBMS Rankings by popularity (DB-engines, 2015) ................................................................... 6 Figure 7 DBMS Model by popularity (DB-engines, 2015) ....................................................................... 7 Figure 8 Commercial Industrial Sectors .................................................................................................. 8 Figure 9 consistency in reporting ............................................................................................................ 9 Figure 10 the mathematics of ER Model .............................................................................................. 10 “What is our business? And what business are we in?” Peter Drucker. Introduction In this report, the nature of a DBMS (Database Management System) will be considered. In addition, how it works will be investigated and an exploration of the market will be undertaken to ascertain what is considered to be leading in the field. An organisation’s needs for data, how this may have changed over the years and how DBMS has evolved to cope with these changes will be considered. PICK OUT KEY POINTS RE. THESE FOR CONCLUSION Section 1 A database, (db), is “A structured set of data held in computer storage and typically accessed or manipulated by means of specialized software.” (database, n., 2015) This specialized software introduces the term database management system (DBMS) which is used to describe the application suite which administers and manages the database. Within the database, software tools are principally employed for the: storing, indexing, modification, retrieval, transitioning, and querying of data within the system. Gartner defines a DBMS as a complete software system used to define, create, manage, update and query a database, by which we mean an organized collection of data that may be structured in multiple formats and stored in some form of storage medium. The DBMS market is defined by relational and nonrelational database management products that are suitable for a broad range of enterprise-level transactional applications. (Computer Weekly, 2015) How does it work The DBMS is an interface between the database and the entities that need and require access to the database. Users DBMS Database Applications Services Figure 1 DBMS as an interface Advantages of the DBMS Data sharing Data security Data integration Referential integrity Minimized data inconsistency Improved data access Improved decision making Increased end-user productivity Disadvantages of a DBMS Increased costs Management complexity Maintaining currency Frequent upgrade/replacement cycles Table 1 Characteristics of a DBMS the advantages and disadvantages (My Reading Room, 2012) A DBMS is composed of three principal elements, these are: 1. Physical database management Transparent to the user is the storage and retrieval of the logical database over multiple physical files in the storage medium. Figure 2 Physical and Logical Databases (Microsoft Corporation, 2015) 2. Database engine The Database Engine is the principal service for storing, processing, and securing data; it is this package element that is responsible for the: create, read, update and delete (CRUD) functionality of the DBMS. 3. Database schema The db schema is a logical data storage structure. The schema is composed of a number of schema objections, known as a collection. Some examples of schema objects are: tables views indexes database links procedures functions packages Physical database Database engine Database schema The DBMS manages a single logical presentation of the Figure 3 Composition of a DBMS data safeguarding concurrency, data integrity, security and backup. Of the DBMS on the market, those considered to be leading products are Oracle, Microsoft, IBM and SAP. These organisations are so market dominant that there is little left for rival companies other than those involved in very specific market niches, such as Neo Technology with their graph product. Hybrid capabilities extend the functionality of the RDBMS to more use cases, making a stronger case for their ubiquity. (Mullins, Key criteria for deciding if a relational DBMS meets your IT needs, 2015) “Relational databases, or RDBMSes, became the norm in IT more than 30 years ago as low-cost servers became powerful enough to make them widely practical and relatively affordable.” (Mullins, Evaluating the different types of DBMS products , 2015) Figure 4 Gartner Magic Quadrant of DBMS providers There are broadly five DBMS models to consider excluding RDBMS. These include key-value DBMS models which utilize “a key-value pair (KVP) which is a set of two linked data items: a key, which is a unique identifier for some item of data, and the value, which is either the data that is identified or a pointer to the location of that data”. (Rouse, 2008). Another model is the document model, in which every record in viewed as a document. (MongoDB, 2015). The column model stores data in columns instead of Key-value Document Column store Graph In memory rows. (Rouse, 2010). The graph model applies the mathematical practice of graph theorem. This model significantly differs from other models in that it allows for ACID compliance, in common with RDBMS. A: atomicity. C: consistency. I: isolation. D: durability. This model is a cornerstone of transaction processing. Compliance with the model ensures that transactions with the database are processed dependably, in that they are successfully committed or completely rolled back. Figure 5 five DBMS models DBMS ranked by popularity Redis SQLite Cassandra Microsoft Access DB2 PostgreSQL MongoDB Microsoft SQL Server MySQL Oracle 0.00% 5.00% 10.00% 15.00% 20.00% 25.00% 30.00% 35.00% Figure 6 DBMS Rankings by popularity (DB-engines, 2015) 1.91% DBMS Model by popularity 2.11% 5.41% 90.57% Relational DBMS Document store Wide column store Key-value store Figure 7 DBMS Model by popularity (DB-engines, 2015) The other models employ a BASE methodology. Basically Available, Soft state, Eventual consistency. This BASE model does not provide the guarantees in data transactions that ACID offers, but reflects the complexities of the data managed by these DBMS. The final form of DBMS is an emergent form called the in memory model. These databases’ data is “stored in main memory to facilitate faster response times”. (Rouse, in-memory database, 2012) Section 2 Business run on data: the amount of a sale, the cost of a purchase, and the wages of staff: but businesses survive on cash flow and strategic planning; typically one of the first strategic Information Technology and Information Systems decision an organisation makes is its accounting system. Its first serious venture into Business Intelligence (BI). The business can now start to gather, report and plan consistently on the organisations Key Performance Indicators (KPI). Businesses do not start big. They start small and grow. Figure 8 Commercial Industrial Sectors For a small business information needs are initially awfully basic and can be addressed predominantly utilising lists, but as the business grows, and the models for management start to formalise, there becomes a requirement for the enhanced management of data and information sources. During this business growth, the business starts to develop formal reporting structures for its operational data, tactical information, together with its strategic management information. Sources of this data and information include: accounting information o sales order processing o purchase order processing o profit and loss accounts o balance sheet Inventory management Production data Over time the organisation will have generated empirical data to work with, consequently it can begin to forecast, complete empirical comparisons and plan for the future needs. If the business itself continues to grow, it may require other functions to perform its operational and strategic activities. Such functions are: Payroll Marketing HRM CAD/CAM Also there will be an extension of IT into other areas of business. Unless planned for, local and workgroup developed databases and technology infrastructure will prove difficult and costly to integrate into an overall strategy. There may also be data model alignment issues as a result of mergers or acquisitions. As the business grows into the large phase, data security issues will rise to prominence. The changes in data value are now very real and the threats to them incredibly serious. Breaches in security are costly to the organisation. In this large phase of business development the physicality of data requirements will also change. There will be a lot more computers, tablets, phones, scanners, copiers, printers and cameras to manage. In addition there will be the large scale server, mainframe and networking installations required to service the demands, needs and wants of the business. Fundamentally it comprises a lot of infrastructure. Business Intelligence (BI), is an analytical data analysis process aimed at enhancing business performance by facilitating key decision makers with the ability to collect, store, retrieve and analyze data to make better-informed decisions. It provides for a consistency, which may otherwise be lacking, in reporting and analysis. As the maxim goes: knowledge is power: but the transitioning from data, through information to result in knowledge is time consuming. With this in mind there is and should only be “one version of the truth”, a single set of data that provides the source for all analysis, planning and decision making. Data warehousing and BI have become almost interchangeable terms, as definitions vary for what is a very broad term. It is not just a decision support system, nor is it a methodology, it does not require a data warehouse, and it is not the tool set. data entry systems data warehouse enterprise reporting system Figure 9 consistency in reporting Data quality is important in the creation of the data warehouse, as the adage goes: garbage in garbage out. BI is/can be about monitoring change. As a business organization it is beneficial to be able to react to change. In addition the business may want to cause change. Small businesses can survive and prosper on the instincts and intuition of the proprietor; decisions are not necessarily made from factual evidence. As the business grows, layers of management are introduced and “buy in” from stakeholders is required. The business formalizes and professionalizes. Section 3 Outline evolution of databases and discuss how they have tried to address problems Hierarchical Database Network Database Columnar Object Orientated Database Key value Relational Database Named graph noSQL Database Document model Change happens at a rate; the rate of this change, n, will affect an organisations’ ability to deal with the results, positive or negative of the change. The faster the rate of n the more agile the organisation needs to be in order to facilitate a successful outcome to the change. The less agile the organisation the more the change will meet resistance. This resistance to change can become inherent as the management structure of the organisation grows. Entity– relationship model Flat file In databases a flat file refers to data files that contain records with no structured relationships. (Webopedia, 2015) In SQL for Dummies, an introduction to Structured Query Language, Allen G. Taylor notes that the advantage of a flat file is that it takes up less space than a structured file. However, it does require the application to have knowledge of how the data is organized within the file. Almost pervasively thought of as: A flat file database is a database that stores data in a plain text file. Each line of the text file holds one record, with fields separated by delimiters, such as commas or tabs. (TechTerms.com, 2015) Hierarchical database model A hierarchical database consists of a collection of records that are connected to each other through links. Each record is a collection of fields (attributes), each of which contains only one data value. A link is an association between precisely two records. Two influential database systems that rely on the hierarchical model are IBM’s Information Management System (IMS) [IBM 1978a, McGee 1977] and MRI’s System 2000 [MRI 1974, 1979]. The first IMS version was developed in the late 1960s by IBM and by North American Aviation (Rockwell International) for the Apollo moon-landing program. (Silberschatz, Korth & Sudarshan, 2010) Network model A network database consists of a collection of records connected to one another through links. Each record is a collection of fields (attributes), each of which contains only one data value. A link is an association between precisely two records. (Silberschatz, Korth & Sudarshan, 2010) Relational model The relational model is cantered on this idea: the organization of data into collections of twodimensional tables called “relations.” (Ullman & Aho, 1994) Entity–relationship model (ER) ER Model is based on Strong Mathematical Foundations. The ER model is based on: 1. Set Theory, 2. Mathematical Relations, 3. Modern Algebra, Set Theory Lattice Theory Logic Mathematical Relations Modern Algebra Figure 10 the mathematics of ER Model 4. Logic, and 5. Lattice Theory. In the relational model the mathematical relation construct is used to express the “structure of data values,” while in the ER model the same construct is used to express the “structure of entities.” ER Model has Explicit Linkage between Entities. The linkage between entities is explicit in the ER model while in the relational model is implicit. In addition, the cardinality information is explicit in the ER model, and some of the cardinality information is not captured in the relational model. (Chen, 2015) Object model An object-oriented database management system (OODBMS), sometimes shortened to ODBMS for object database management system), is a database management system (DBMS) that supports the modelling and creation of data as objects. This includes some kind of support for classes of objects and the inheritance of class properties and methods by subclasses and their objects. There is currently no widely agreed-upon standard for what constitutes an OODBMS. (Rouse & Kumar, object-oriented database management system (OODBMS or ODBMS), 2005) Document model Also known as JSON store or database Document Stores provide more functionality: the system recognizes the structure of the objects stored. Objects (or documents) may have a variable number of named attributes of various types (integers, strings, and possibly nested objects), objects can grouped into collections, and the system provides a simple query mechanism to search collections for objects with particular attribute values. (Cattell, 2010) Graph model Formally, a graph is just a collection of vertices and edges—or, in less intimidating language, a set of nodes and the relationships that connect them. Graphs represent entities as nodes and the ways in which those entities relate to the world as relationships. This general-purpose, expressive structure allows us to model all kinds of scenarios. (Robinson, Webber & Eifrem, 2013) Key value Key-value stores are the simplest NoSQL databases. Every single item in the database is stored as an attribute name (or "key"), together with its value. (MongoDB, 2015) Key-value Stores provide a distributed index for object storage, where the objects are typically not interpreted by the system: they are stored and handed back to the application as BLOBs. However, these systems usually provide object replication for recovery, partitioning of the data over many machines, and rudimentary object persistence. (Cattell, 2010) Columnar Are optimized for queries over large datasets, and store columns of data together, instead of rows. (MongoDB, 2015) Don’t forget to comment on how data requirements haven’t changed significantly however as the organisation grows there are additional functionalities that generate data that need to be included in the information system TRY TO FIND REF. Conclusion On the horizon are a number of other db related ideas: CloudDB, DBaaP, DBaaS, DBaaA In conclusion, the market for DBMS is dominated by four major organisations: Oracle, Microsoft, IBM and SAP. All of these organisations have successfully read the market requirements for data management systems and users of other systems have been profiled and collectively marginalised into exceptionally narrow market niches.