Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Introduction to Database Systems Chpt 1 Instructor: Weichao Wang Database Management Systems Ramakrishnan & Gehrke 1 http://www.sigmod.org/record/issues/0606/index.html Database Management Systems Ramakrishnan & Gehrke 2 History 60s C. Bachman GE network data model Late 60s IBM IMS hierarchical data model 70 E.Codd relational model 80s SQL IBM R trasaction J. Gray Late 80s-90s DB2, Oracle, informix, sybase 90s DW, internet, distributed database Now Big Data Turing award and Turing test? Database Management Systems Ramakrishnan & Gehrke 3 What Is a DBMS? A very large, integrated collection of data. Models real-world enterprise. – Entities (e.g., students, courses) – Relationships (e.g., Madonna is taking ITCS6160) A Database Management System (DBMS) is a software package designed to maintain and utilize databases. Database Management Systems Ramakrishnan & Gehrke 4 Why not just OS file systems? Size of the data and size of your memory/harddisk Query processing: remember your file read/write C programs? Now think about several tera-bytes of data. You need a separate program for every query. Consistency: multiple users access the same data Recovery: is it on harddisk now? All these can be implemented directly upon OS. But then you are just designing your own DB and DBMS. Database Management Systems Ramakrishnan & Gehrke 5 Why Use a DBMS? Data independence and efficient access. Reduced application development time. Data integrity and security. Uniform data administration. (not sure about this now) Concurrent access, recovery from crashes. Database Management Systems Ramakrishnan & Gehrke 6 Why Study Databases?? Shift from computation to information (application oriented vs data oriented) – at the “low end”: scramble to webspace – at the “high end”: scientific applications Datasets increasing in diversity and volume. – Digital libraries, interactive video, Human Genome project, EOS project – ... need for DBMS exploding DBMS encompasses most of CS – OS, languages, theory, AI, multimedia, logic Database Management Systems Ramakrishnan & Gehrke 7 Data Models A data model is a collection of concepts for describing data. A schema is a description of a particular collection of data, using the given data model. The relational model of data is the most widely used model today. – Main concept: relation, basically a table with rows and columns. – Every relation has a schema, which describes the columns, or fields. Database Management Systems Ramakrishnan & Gehrke 8 Levels of Abstraction Many views, single conceptual (logical) schema and physical schema. View 1 – Views describe how users see the data. – Conceptual schema defines logical structure – Physical schema describes the files and indexes used. View 2 View 3 Conceptual Schema Physical Schema Schemas are defined using DDL; data is modified/queried using DML. Database Management Systems Ramakrishnan & Gehrke 9 Example: University Database Conceptual schema: – Students(sid: string, name: string, login: string, age: integer, gpa:real) – Courses(cid: string, cname:string, credits:integer) – Enrolled(sid:string, cid:string, grade:string) Physical schema: – Relations stored as unordered files. – Index on first column of Students. External Schema (View): – Course_info(cid:string,enrollment:integer) – Each data entry is stored only once. Views are created. Database Management Systems Ramakrishnan & Gehrke 10 Data Independence Applications insulated from how data is structured and stored. Logical data independence: Protection from changes in logical structure of data. Physical data independence: Protection from changes in physical structure of data. Key is to reduce workload and overhead of end users. One of the most important benefits of using a DBMS! Database Management Systems Ramakrishnan & Gehrke 11 These layers must consider concurrency control and recovery Structure of a DBMS A typical DBMS has a Query Optimization layered architecture. and Execution The figure does not Relational Operators show the concurrency Files and Access Methods control and recovery components. Buffer Management This is one of several Disk Space Management possible architectures; each system has its own variations. DB Database Management Systems Ramakrishnan & Gehrke 12 Transaction Management: ACID properties A tomicity: All actions in the Xact happen, or none happen. C onsistency: If each Xact is consistent, and the DB starts consistent, it ends up consistent. I solation: Execution of one Xact is isolated from that of other Xacts. D urability: The Recovery Manager guarantees Atomicity & Durability. If a Xact commits, its effects persist. Database Management Systems Ramakrishnan & Gehrke 13 Motivation of concurrency control Consistency Isolation Example – – – – Two parallel transactions T1 and T2 Serial execution Execution with interleaving actions Similar situations in OS and any other resource competitions Database Management Systems Ramakrishnan & Gehrke 14 Motivation of recovery management Atomicity: – Transactions may abort (“Rollback”). Durability: – What if DBMS stops running? (Causes?) Desired Behavior after system restarts: – T1, T2 & T3 should be durable. – T4 & T5 should be aborted (effects not seen). Database Management Systems T1 T2 T3 T4 T5 Ramakrishnan & Gehrke crash! 15 Databases make these folks happy ... End users and DBMS vendors DB application programmers – E.g. smart webmasters Database administrator (DBA) – – – – Designs logical /physical schemas Handles security and authorization Data availability, crash recovery Database tuning as needs evolve Must understand how a DBMS works! Database Management Systems Ramakrishnan & Gehrke 16 New challenges Application oriented to data oriented Unstructured data Conflict b/w data and user privacy – Data taint/trace Challenges caused by cloud: – Storage places – Index of encrypted data files – Proof of retrievability – Mobile: compute it locally or transmit it Database Management Systems Ramakrishnan & Gehrke 17 Summary DBMS used to maintain, query large datasets. Benefits include recovery from system crashes, concurrent access, quick application development, data integrity and security. Levels of abstraction give data independence. A DBMS typically has a layered architecture. DBAs hold responsible jobs and are well-paid! DBMS R&D is one of the broadest, most exciting areas in CS. Database Management Systems Ramakrishnan & Gehrke 18