Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Oracle Database wikipedia , lookup
Extensible Storage Engine wikipedia , lookup
Entity–attribute–value model wikipedia , lookup
Open Database Connectivity wikipedia , lookup
Microsoft Jet Database Engine wikipedia , lookup
Concurrency control wikipedia , lookup
ContactPoint wikipedia , lookup
Relational model wikipedia , lookup
CENG 352 Database Management Systems Nihan Kesim Çiçekli email: [email protected] URL: http://www.ceng.metu.edu.tr/~nihan CENG 352 • • • • Instructor: Nihan Kesim Çiçekli Office: A308 Email: [email protected] Lecture Hours: Tue. 11:40 (BMB5); Thu. 10:40-12:30 (BMB5) • Course Web page: http://cow.ceng.metu.edu.tr • Teaching Assistant: Ömer Baykal, [email protected] 2 Required Background • Material from CENG351 (EntityRelationship modeling, relational data model, relational algebra, SQL, B+-trees and hash-based indexing, algorithms for sorting/merging data files) • Intermediate programming skills – C/C++/Java 3 Text Books and References 1. Raghu Ramakrishnan, Database Management Systems, McGraw Hill, 3rd edition, 2003 (text book). 2. R. Elmasri, S.B. Navathe, Fundamentals of Database Systems, 4th edition, Addison-Wesley, 2004. 3. A. Silberschatz, H.F. Korth, S. Sudarshan, Database System Concepts, McGraw Hill, 4th edition, 2002. 4. H. Garcia-Molina, J. D. Ullman, J. Widom, Database Systems The Complete Book, Prentice Hall, 2002. 4 Course Outline • • • • • • • • Review of the Relational Data Model Database Application Development Relational Database Design and Tuning Transaction Processing Concurrency Control Crash Recovery Query Processing Query Optimization 5 Grading • • • • Midterm Assignments and Quizzes Project Final Exam 25 % 25 % 20 % 30 % 6 Grading Policies • Policy on missed midterm: – no make-up exam; compensated at final (only for the legal excuses and if informed beforehand) • Policy on final exam: – Get at least 30% points from at least one of the written assignments and at least 30% points from the programming assignment – Attendance should be > 50% • Lateness policy: – No late submission for the written assignments. – For the programming homework(s): can be late up to 7 days. Afterwards, will be penalized by 10%* day2 – The last assignment may not be turned in late – All assignments are to be your own work. 7 Project • Projects in groups. – Determine your groups till the end of add/drops. • 3 stages: – Proposal Report (E/R) – Design Report – Final Report + Demo 8 Introduction 9 File-Based Systems • Collection of application programs that perform services for the end users (e.g. reports). • Each program defines and manages its own data. 10 File-Based Processing Pearson Education © 2009 11 Limitations of File-Based Approach • Separation and isolation of data – Each program maintains its own set of data. – Users of one program may be unaware of potentially useful data held by other programs. • Duplication of data – Same data is held by different programs. – Wasted space and potentially different values and/or different formats for the same item. 12 Limitations of File-Based Approach • Data dependence – File structure is defined in the program code. • Incompatible file formats – Programs are written in different languages, and so cannot easily access each other’s files. • Fixed Queries/Proliferation of application programs – Programs are written to satisfy particular functions. – Any new requirement needs a new program. 13 Database Approach • Arose because: – Definition of data was embedded in application programs, rather than being stored separately and independently. – No control over access and manipulation of data beyond that imposed by application programs. • Result: – the database and Database Management System (DBMS). 14 Database • Shared collection of logically related data (and a description of this data), designed to meet the information needs of an organization. • System catalog (metadata) provides description of data to enable program–data independence. • Logically related data comprises entities, attributes, and relationships of an organization’s information. 15 Database Management System (DBMS) • A software system that enables users to define, create, maintain, and control access to the database. • (Database) application program: a computer program that interacts with database by issuing an appropriate request (SQL statement) to the DBMS. 16 Typical DBMS Functionality • Define a database : in terms of data types, structures and constraints • Construct or load the database on a secondary storage medium • Manipulating the database : querying, generating reports, insertions, deletions and modifications to its content • Concurrent Processing and Sharing by a set of users and programs – yet, keeping all data valid and consistent 17 17 Database Management System (DBMS) Pearson Education © 2009 18 Database Approach • Data definition language (DDL). – Permits specification of data types, structures and any data constraints. – All specifications are stored in the database. • Data manipulation language (DML). – General enquiry facility (query language) of the data. 19 Database Approach • Controlled access to database may include: – – – – – a security system an integrity system a concurrency control system a recovery control system a user-accessible catalog. 20 Views • Allows each user to have his or her own view of the database. • A view is essentially some subset of the database. 21 Views - Benefits • Reduce complexity • Provide a level of security • Provide a mechanism to customize the appearance of the database • Present a consistent, unchanging picture of the structure of the database, even if the underlying database is changed 22 Disadvantages of DBMSs • • • • • • Complexity Size Cost of DBMS Additional hardware costs Performance Higher impact of a failure 23 Components of DBMS Environment • Hardware – Can range from a PC to a network of computers. • Software – DBMS, operating system, network software (if necessary) and also the application programs. • Data – Used by the organization and a description of this data called the schema. • Procedures – Instructions and rules that should be applied to the design and use of the database and DBMS. • People 24 Roles in the Database Environment • • • • • Data Administrator (DA) Database Administrator (DBA) Database Designers (Logical and Physical) Application Programmers End Users (naive and sophisticated) 25 History of Database Systems • First-generation – Hierarchical and Network • Second generation – Relational • Third generation – Object-Relational – Object-Oriented 26 Data Model • Integrated collection of concepts for describing data, relationships between data, and constraints on the data in an organization. – Purpose is to represent data in an understandable way. • Data Model comprises: – a structural part; – a manipulative part; – possibly a set of integrity rules. 27 Data Models • Object-Based Data Models – – – – Entity-Relationship Semantic Functional Object-Oriented. • Record-Based Data Models – Relational Data Model – Network Data Model – Hierarchical Data Model. • Physical Data Models Pearson Education © 2009 28 Relational Data Model Pearson Education © 2009 29 Network Data Model Pearson Education © 2009 30 Hierarchical Data Model Pearson Education © 2009 31 Conceptual Modeling • Conceptual schema is the core of a system supporting all user views. • Should be complete and accurate representation of an organization’s data requirements. • Conceptual modeling is the process of developing a model of information use that is independent of implementation details. • The result is the conceptual schema. 32 Objectives of Three-Level Architecture • DBA should be able to change database storage structures without affecting the users’ views. • Internal structure of database should be unaffected by changes to physical aspects of storage. • DBA should be able to change conceptual structure of database without affecting all users. 33 Three-Level Architecture 34 Three-Level Architecture • External Level – Users’ view of the database. – Describes that part of database that is relevant to a particular user. • Conceptual Level – Community view of the database. – Describes what data is stored in database and relationships among the data. • Internal Level – Physical representation of the database on the computer. – Describes how the data is stored in the database. 35 Differences between Three Levels of the Architecture Pearson Education © 2009 36 Data Independence • Logical Data Independence – Refers to immunity of external schemas to changes in conceptual schema. – Conceptual schema changes (e.g. addition/removal of entities). – Should not require changes to external schema or rewrites of application programs. 37 Data Independence • Physical Data Independence – Refers to immunity of conceptual schema to changes in the internal schema. – Internal schema changes (e.g. using different file organizations, storage structures/devices). – Should not require change to conceptual or external schemas. 38 Data Independence and the ThreeLevel Architecture Pearson Education © 2009 39 Database Languages • Data Definition Language (DDL) – Allows the DBA or user to describe and name entities, attributes, and relationships required for the application – plus any associated integrity and security constraints. • Data Manipulation Language (DML) – Provides basic data manipulation operations on data held in the database. 40 System Catalog • Repository of information (metadata) describing the data in the database. • One of the fundamental components of DBMS. • Typically stores: – – – – – names, types, and sizes of data items; constraints on the data; names of authorized users; data items accessible by a user and the type of access; usage statistics. 41 The DBMS Marketplace • Relational DBMS companies – Oracle, Sybase – are among the largest software companies in the world. • IBM offers its relational DB2 system. • Microsoft offers SQL-Server, plus Microsoft Access for the cheap DBMS on the desktop, answered by “lite” systems from other competitors. • Relational companies also challenged by “object-oriented DB” companies. • But countered with “object-relational” systems, which retain the relational core while allowing type extension as in OO systems. 42 Three Aspects to Studying DBMS's 1. Modeling and design of databases. – Allows exploration of issues before committing to an implementation. 2. Programming: queries and DB operations like update. – SQL 3. DBMS implementation. 43 Components of a DBMS • Major components of a DBMS: – – – – – – Query processor Database manager (DM) File manager DML preprocessor DDL compiler Catalog manager 44 Components of a DBMS • Major software components for database – Transaction manager manager – – – – Authorization control Command processor Integrity checker Query optimizer – Scheduler – Recovery manager – Buffer manager 45 Components of a DBMS Pearson Education © 2009 46 Components of Database Manager (DM) 47 Pearson Education © 2009 Multi-User DBMS Architectures • Teleprocessing • File-server • Client-server 48 Teleprocessing • Traditional architecture. • Single mainframe with a number of terminals attached. • Trend is now towards downsizing. 49 Pearson Education © 2009 File-Server • File-server is connected to several workstations across a network. • Database resides on file-server. • DBMS and applications run on each workstation. • Disadvantages include: – Significant network traffic. – Copy of DBMS on each workstation. – Concurrency, recovery and integrity control more complex. Pearson Education © 2009 50 File-Server Architecture Pearson Education © 2009 51 Traditional Two-Tier Client-Server • Client (tier 1) manages user interface and runs applications. • Server (tier 2) holds database and DBMS. • Advantages include: – – – – – wider access to existing databases; increased performance; possible reduction in hardware costs; reduction in communication costs; increased consistency. Pearson Education © 2009 52 Traditional Two-Tier Client-Server Pearson Education © 2009 53 Traditional Two-Tier Client-Server Pearson Education © 2009 54 Three-Tier Client-Server • Client side presented two problems preventing true scalability: – ‘Fat’ client, requiring considerable resources on client’s computer to run effectively. – Significant client side administration overhead. • By 1995, three layers proposed, each potentially running on a different platform. 55 Three-Tier Client-Server • Advantages: – ‘Thin’ client, requiring less expensive hardware. – Application maintenance centralized. – Easier to modify or replace one tier without affecting others. – Separating business logic from database functions makes it easier to implement load balancing. – Maps quite naturally to Web environment. 56 Three-Tier Client-Server Pearson Education © 2009 57