* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Database Systems: Design, Implementation, and Management
Survey
Document related concepts
Transcript
BTM 382 Database Management Chapter 2: Data models Chapter 12.12-13: CAP and Hadoop Chitu Okoli Associate Professor in Business Technology Management John Molson School of Business, Concordia University, Montréal 1 Models and data models What is a model? • A model is a simplified way to describe or explain a complex reality • A model helps people communicate and work simply yet effectively when talking about and manipulating complex real-world phenomena 3 Scientific models Sources: http://www.redorbit.com/education/reference_library/space_1/universe/2574692/geocentric_model/ http://hendrianusthe.wordpress.com/2012/06/21/heliocentric-vs-geocentric/ 4 Conceptual models Sources: http://info563.malagaclasses.info/strategy-it-2/ http://fivewhys.wordpress.com/2012/05/22/business-model-innovation/ 5 Importance of Data Models Communication tool Give an overall view of the database Organize data for various users Are an abstraction for the creation of good database 6 6 The Evolution of Data Models Obsolete models: Hierarchical and network models 8 The Relational Model • Uses key concepts from mathematical relations (tables) – “Relational” in “relational model” means “tables” (mathematical relations), not “relationships” • Table (relations) – Matrix consisting of row/column intersections • Relations have well defined methods (queries) for combining their data members – Selecting (reading) and joining (combining) data is defined based on rigorous mathematical principles • Relational data management system (RDBMS) – Relations where originally too advanced for 1970s computing power – As computing power increased, simplicity of the model prevailed 9 The Entity Relationship Model • Very detailed specification of relationships and their properties • Enhancement of the relational model – Relations (tables) become entities • Entity relationship diagram (ERD) – Uses graphic representations to model database components • Many variations for notation exist; we will use the Crow’s Foot notation 10 11 The Object-Oriented Data Model (OODM) • Addresses “impedance mismatch” problem of the ER model – The ER model’s view of data (tables) and programmers’ view of data (objects in OOP), is completely different – This mismatch makes database programming painful, especially for very complex data structures • OODM Uses object-oriented programming concepts to store data – – – – Objects represent nouns (entities or records) Objects have attributes (properties or fields) with values (data) Objects have methods (operations or functions) Classes group similar objects using a hierarchy and inheritance • In an OODBMS, the data retrieval and storage closely mirrors the data structures that programmers use, and so programming complex objects is much easier than with the ER model • More advanced forms support the Extended Relational Data Model, Object/Relational DBMS, and XML data structures 12 OODBMS vs. RDBMS https://youtu.be/kORTgvfHl4g 13 Big Data and NoSQL Explaining Big Data https://youtu.be/7D1CQ_LOizA 15 Big Data • Volume – Huge amounts of data (terabytes and petabytes), especially from the Internet • Velocity – Organizations need to process the huge amounts of data rapidly, just as with smaller databases • Variety – Wide variety of data, much of it unstructured and even changing in structure 16 16 Big data’s solutions and RDBMS’s failure • Scale up: use more powerful servers – RDBMS is very computing intensive – More data requires much faster, more capable, expensive computers, and even that’s not good enough for big data • Scale out: use many cheap distributed servers – RDBMS doesn’t work rapidly with distributed processing – Consistency is the biggest problem: guaranteeing consistency (which RDBMS is great at) is slow, too slow for big data 17 What is NoSQL? https://www.youtube.com/watch?v=qUV2j3XBRHc 18 NoSQL Databases to the Big Data rescue • “NoSQL” means: – Non-relational or non-RDBMS – Also “Not only SQL”—a few do support SQL • It is not one model; it is many different models that are not relational • High scalability – Support distributed database architectures • High availability – Rapid performance for big data, including unstructured and sparse data • Fault tolerance – Continue to work even if some servers in the cluster fail • Geared toward performance rather than transaction consistency • Store data in key-value stores 19 19 Disadvantages of NoSQL • Complex programming is required – “NoSQL” means you lose the ease-of-use and structural independence of SQL – There is often no relationship support in the database—you have to program relationships in code • There is no transaction integrity support – The data you retrieve at any given moment might be wrong… but it will eventually become OK – This is the price to pay for rapid performance in a distributed database 20 20 The CAP theorem for distributed databases • CAP stands for: – Consistency: All nodes see the same data – Availability: A request always gets a response (success or failure) – Partition tolerance: Even if a node fails, the system can still function • A distributed database can guarantee only two of the three CAP characteristics, never all three at the same time – However, over time, it might be able to provide all three • NoSQL databases are distributed, and so the CAP theorem restricts them to providing BASE, not ACID 21 21 ACID versus BASE • A relational database guarantees the ACID properties: – Atomicity, Consistency, Isolated, Durable – In short, a set of SQL statements (called a transaction) will either all work, or all fail—no half way success, and the result will not corrupt the database – A price to pay: results might be somewhat slow • NoSQL database only guarantee BASE properties: – Basically Available, Soft-state, Eventual consistency – In short, at any given moment, not everything might be consistent, but the database will eventually get consistent – In return, these imperfect results are delivered fast 22 Table 12.8 – Distributed Database Spectrum Sacrifices availability to ensure consistency and isolation 23 23 Historical outline of data models 24 Which data model should you use? • Hierarchical or network models – Obsolete—no one uses these any longer • Entity-relationship model – Continuation or enhancement of the relational model – 90% or more of professional database situations • Object-oriented database – When you have very complex data structures, you need rapid performance, and it makes business sense • Source: Barry & Associates, Inc – Data structures are so complex that organizing data as tables causes headaches in programming retrieval and storage • NoSQL – Vast amounts of unstructured data where you need rapid performance – Speed is more important than data consistency 25 Sources • Most of the slides are adapted from Database Systems: Design, Implementation and Management by Carlos Coronel and Steven Morris. 11th edition (2015) published by Cengage Learning. ISBN 13: 978-1-285-19614-5 • Other sources are noted on the slides themselves 26 26